vscode-remark Configuration Notes
Setup
- Extension:
unifiedjs.vscode-remarkv3.2.0. - Config file:
.remarkrc.mjs(ESM) at workspace root or any parent directory. - Two working config styles exist:
- Inline-only: no
package.json, all plugins defined in.remarkrc.mjs. - Installed plugins: local
package.jsonplusremark-gfmandremark-frontmatterimported from.remarkrc.mjs.
- Inline-only: no
Design Decision: Inline vs Installed Plugins
There are two viable ways to run vscode-remark with this config.
Inline-only config keeps everything inside .remarkrc.mjs and avoids npm
metadata in the repo. This is useful when the main goal is keeping Markdown
repos dependency-free.
Installed plugins uses a normal local Node setup and imports ecosystem
packages from .remarkrc.mjs. This is now verified to work in this workspace
with a real .remarkrc.mjs, a local package.json, and installed packages.
It simplifies the config because remark can parse frontmatter and GFM tables
natively.
The two main packages are:
remark-frontmatter— would handle YAML frontmatter natively (replaces ourpreserveFrontmatterparser+compiler wrapper).remark-gfm— would parse tables into propertable/tableRow/tableCellAST nodes (replaces ourformatTablescompiler wrapper andlooksLikeTablehelper). Would also fixheadingJoinnot needing the table detection hack.
Tradeoffs:
- Inline-only avoids
package.jsonandnode_modules, but needs more custom code and more workarounds for things remark does not parse natively. - Installed plugins add
package.json,package-lock.json, andnode_modules/, but remove the table/frontmatter hacks and are easier to reason about. - vscode-remark still cannot reliably use globally installed plugins in Remote/WSL setups, so installed plugins should be local to the repo or a shared parent directory.
- VS Code settings:
"[markdown]": {
"editor.defaultFormatter": "unifiedjs.vscode-remark",
"editor.formatOnSave": true
},
"remark.requireConfig": true
Config File Search (findUp)
The language server uses unified-engine's findUp to locate .remarkrc.mjs.
It walks up the directory tree from the file being formatted, checking each
directory for a config file. It stops at the first match.
For a file at /home/parsia/dev/myproject/docs/file.md, the search order is:
/home/parsia/dev/myproject/docs//home/parsia/dev/myproject/<- workspace-level config/home/parsia/dev//home/parsia/<- home directory config/home//
Global Config (Home Directory)
Place .remarkrc.mjs at ~/.remarkrc.mjs (/home/parsia/.remarkrc.mjs on
Linux/WSL, C:\Users\Parsia\.remarkrc.mjs on Windows). This applies to all
files under your home directory — which is effectively all workspaces.
Per-Workspace Overrides
A .remarkrc.mjs in a workspace directory takes precedence over the home
directory one (findUp stops at the first match). This allows a global default
with per-workspace overrides when needed.
Symlink Approach
Keep a canonical config in a dotfiles repo and symlink it:
# Linux/WSL: global config via symlink
ln -s ~/dotfiles/.remarkrc.mjs ~/.remarkrc.mjs
# Linux/WSL: per-workspace symlink
ln -s ~/dotfiles/.remarkrc.mjs /path/to/workspace/.remarkrc.mjs
# Windows (elevated PowerShell): global config
New-Item -ItemType SymbolicLink -Path "$env:USERPROFILE\.remarkrc.mjs" -Target "C:\dotfiles\.remarkrc.mjs"
Remote WSL Considerations
When using VS Code Remote WSL, the language server runs on the Linux side.
Config search uses the Linux filesystem (/home/parsia/). For files opened
locally on Windows (not via Remote), the server runs on Windows and searches the
Windows filesystem (C:\Users\Parsia\). If you use both, you need the config in
both locations.
Extension Settings
The extension only has three settings: remark.requireConfig (boolean),
remark.trace.server.format, and remark.trace.server.verbosity. There is no
setting for a global config path — the findUp search is the only mechanism.
Window Reload Required
The language server uses import() to load .remarkrc.mjs. Node.js caches ES
modules for the lifetime of the process. Every change to .remarkrc.mjs
requires reloading the VS Code window (Ctrl+Shift+P -> "Developer: Reload
Window") to take effect.
Config File Structure
Everything goes in one .remarkrc.mjs file. Plugins are defined as inline
functions. Separate plugin files (.js/.mjs imports) do NOT work without a
package.json because the bundled load-plugin in vscode-remark can't resolve
local paths correctly.
// Plugin functions live inline in `.remarkrc.mjs`.
function myPlugin() {
// Plugin setup or transformer logic goes here.
// ...
}
export default {
settings: {
// Built-in remark-stringify options go here.
},
plugins: [
// Plugins are registered here in execution order.
myPlugin
]
}
Settings (remark-stringify options)
Settings map directly to mdast-util-to-markdown options. They are applied via
processor.data('settings', ...). Only use settings that actually correspond to
a formatting rule you need.
Rule-to-Setting Mapping
| Formatting Rule | How It's Handled | Detail |
|---|---|---|
Use * for unordered lists, not - | bullet: '*' setting | Converts all - and + list markers to * |
| One space between list marker and text | listItemIndent: 'one' setting | Produces * text with exactly one space |
| No blank line between heading and paragraph | join: [headingJoin] setting | Custom join function returns 0 for heading+paragraph |
| Blank line between heading and list | join: [headingJoin] setting | Custom join function returns 1 for heading+list |
| Blank line between heading and code block | join: [headingJoin] setting | Custom join function returns 1 for heading+code |
Use -> and <- instead of → and ← | replaceText plugin | Factory function: final serialized-text replacement across the whole document |
| Blank line between normal text and list | remark default | remark-stringify always adds \n\n between all block elements |
| Blank line between heading and heading | remark default + join | Default \n\n, also explicitly returned as 1 in headingJoin |
| Bold in lists -> backticks | boldToCodeInLists plugin | Replaces leading strong in listItem with inlineCode |
| Bold -> heading conversion | boldToHeading plugin | Replaces standalone strong paragraphs with heading at context-aware depth |
| Align table columns | formatTables plugin | Compiler wrapper: pads cells and aligns pipes in the serialized output |
| Don't escape text, URLs, or definitions | Custom handlers setting | Override text, link, definition handlers to skip state.safe() |
| No blank lines between reference definitions | tightDefinitions: true | Built-in remark-stringify setting |
| Preserve YAML frontmatter | preserveFrontmatter plugin | Parser+compiler wrapper: strips before parse, re-adds after compile |
Normalize thematic breaks to ---------- | rule: '-' + ruleRepetition: 10 | All ---, ***, * * *, etc. become 10 dashes |
| Preserve Hugo shortcodes | preserveContent factory plugin | Placeholder pattern: strips before parse, restores after compile |
bullet: '*' - remark-stringify uses this character for all unordered list
markers. Converts - and + lists to *. Fixes the rule: "Use * for
unordered lists, not -."
listItemIndent: 'one' - controls indent after the list marker. 'one'
means exactly one space: * text. Other values: 'tab' (tab stop width) and
'mixed' (one for tight lists, tab for loose). Must be the string 'one', not
the number 1. Fixes the rule: "Put one space between a list marker and the
text that follows it."
join: [headingJoin] - array of functions controlling blank lines between
blocks. Our headingJoin function handles three rules at once: no blank line
after heading+paragraph, keep blank line for heading+list, heading+code, and
heading+heading. See "Join Functions" section below for details.
Other available settings
Full list: bullet, bulletOther, bulletOrdered,
closeAtx, emphasis, fence, fences, incrementListMarker,
listItemIndent, quote, resourceLink, rule, ruleRepetition,
ruleSpaces, setext, strong, tightDefinitions, handlers, join,
unsafe.
Reference: https://github.com/syntax-tree/mdast-util-to-markdown#options
Join Functions (Blank Line Control)
remark-stringify inserts \n\n (one blank line) between ALL block elements by
default. The only way to change this is the join API. There is no boolean
setting for it (except tightDefinitions for definition lists).
A join function receives (left, right, parent, state) and returns:
0- no blank line (just\n).1ortrue- one blank line (default\n\n). Stops further join processing.- Any number
n-nblank lines (\nrepeatedn+1times). false- nodes can't be adjacent, inserts an HTML comment to break them.undefined- no opinion, try next join function. If all return undefined, the default\n\nis used.
Join functions are checked in reverse order (last added checked first).
headingJoin Implementation
The join function only needs to handle heading+paragraph (remove blank line).
Everything else after a heading keeps the default blank line. The key design
decision: use a whitelist for removal (paragraph) rather than a blacklist of
types to keep. This way new block types (tables, blockquotes, HTML, thematic
breaks, etc.) automatically get the blank line without updating the function.
function headingJoin(left, right) {
// Only customize spacing after headings.
if (left.type === 'heading') {
// Heading + normal paragraph: remove the blank line.
if (right.type === 'paragraph' && !looksLikeTable(right)) return 0
// Heading + anything else: keep the default blank line.
return 1
}
}
Place in config under settings.join:
export default {
settings: {
// Register the join rule with remark-stringify.
join: [headingJoin]
}
}
Block types that keep the blank line after a heading (by returning 1):
list, code, heading, table, blockquote, thematicBreak, html, and
any future types.
headingJoin and tables
Without remark-gfm, tables are parsed as paragraph nodes (not table nodes).
The original headingJoin treated heading+table as heading+paragraph and
removed the blank line.
Fix: The looksLikeTable helper inspects the paragraph's children to detect
table content. It reconstructs the raw text (including inline code), splits by
\n, and checks if the second line is a valid delimiter row (| --- | --- |).
If it is, headingJoin skips the return 0 and falls through to return 1
(keep blank line).
function looksLikeTable(node) {
// Tables only matter on paragraph-like nodes.
if (!node.children) return false
// Rebuild the paragraph text, preserving inline code markers.
const raw = node.children.map(c => {
if (c.type === 'text') return c.value
if (c.type === 'inlineCode') return '`' + c.value + '`'
return ''
}).join('')
// Markdown tables need at least a header row and delimiter row.
const lines = raw.split('\n')
if (lines.length < 2) return false
// A valid delimiter row (`---`, `:---`, `---:`, `:---:`) is the signal.
const cells = parseCells(lines[1])
return cells.length > 0 && cells.every(c => /^:?-+:?$/.test(c))
}
The delimiter row is what makes a table a table — a random paragraph starting
with | won't have | --- | --- | on line 2. This reuses the same
parseCells helper used by formatTables.
Tree Transformer Plugins
Plugins that modify the AST (parsed markdown tree) before serialization. Most
of the custom plugins below are tree transformers, but replaceText was moved
to a compiler wrapper so arrow replacement now applies to the final markdown
output everywhere, not just to text nodes.
Node types: text, heading, paragraph, list, listItem, code,
inlineCode, strong, emphasis, link, image, blockquote,
thematicBreak, html, definition, root.
Text replacement — replaceText factory:
// Factory that returns a remark plugin to replace text patterns in the final
// serialized markdown.
// `mappings` is an object of { 'from': 'to' } pairs.
function replaceText(mappings) {
return function() {
const self = this
const origCompiler = self.compiler
self.compiler = function(tree, file) {
let result = origCompiler.call(this, tree, file)
for (const [from, to] of Object.entries(mappings)) {
result = result.split(from).join(to)
}
return result
}
}
}
Usage in the plugins array:
plugins: [
// Each factory call creates one configured plugin instance.
replaceText({ '→': '->', '←': '<-' }),
]
The function returns a setup plugin that wraps the compiler. It runs after remark has serialized the document, so the replacement applies to the final markdown text rather than only to selected AST node types.
To add more replacements, add entries to the mappings object:
// Add more entries when one plugin instance should handle them together.
replaceText({ '→': '->', '←': '<-', '…': '...' }),
Or use multiple calls for different groups:
plugins: [
// Or keep separate instances when you want the mappings grouped apart.
replaceText({ '→': '->', '←': '<-' }),
replaceText({ '…': '...' }),
]
Because the implementation now uses literal split(...).join(...)
replacement, the mapping keys are treated as plain text, not regular
expressions.
Key points:
- The function passed to
plugins: []is called during processor setup withthisbound to the processor. - It can also return a setup-only plugin that wraps the parser or compiler instead of returning a tree transformer.
- Tree nodes have
.type,.value(for text/code),.children(for containers),.depth(for headings 1-6).
This was changed specifically for arrow replacement. The old implementation
only touched text nodes, so arrows inside inline code, fenced code, YAML
frontmatter, and preserved shortcode content were left alone. The current
implementation rewrites those too because it runs on the final serialized
markdown string.
boldToCodeInLists Implementation
Replaces bold text at the start of list items with inline code. Works for both
ordered and unordered lists because both use listItem nodes in the AST.
AST before (simple): * **file.js** — description
listItem -> paragraph -> [strong -> [text("file.js")], text(" — description")]
AST after: * `file.js` — description
listItem -> paragraph -> [inlineCode("file.js"), text(" — description")]
AST before (with inline code): * **Use `.remarkrc.mjs`, not `.remarkrc.json`.**
listItem -> paragraph -> [strong -> [text("Use "), inlineCode(".remarkrc.mjs"), text(", not "), inlineCode(".remarkrc.json"), text(".")]]
AST after: * `Use .remarkrc.mjs, not .remarkrc.json.`
listItem -> paragraph -> [inlineCode("Use .remarkrc.mjs, not .remarkrc.json.")]
The plugin walks the tree looking for listItem nodes. For each list item, it
checks if the first child of the item's paragraph is a strong node whose
children are all text or inlineCode nodes. If so, it concatenates all
children's .value properties into a single string and replaces the strong
node with an inlineCode node.
function boldToCodeInLists() {
return (tree) => {
function visit(node) {
// Only list items can trigger this conversion.
if (node.type === 'listItem') {
for (const child of (node.children || [])) {
// Look for the paragraph that holds the list item content.
if (child.type === 'paragraph' && child.children && child.children.length > 0) {
const first = child.children[0]
// Only convert leading bold content.
if (first.type === 'strong' && first.children && first.children.length > 0) {
// Allow bold text that contains plain text and inline code.
const allTextOrCode = first.children.every(
c => c.type === 'text' || c.type === 'inlineCode'
)
if (allTextOrCode) {
// Flatten the bold children into one inlineCode node.
const value = first.children.map(c => c.value).join('')
child.children[0] = { type: 'inlineCode', value }
}
}
}
}
}
// Continue through nested structures.
if (node.children) {
node.children.forEach(visit)
}
}
// Walk the whole tree from the root.
visit(tree)
}
}
Conditions for replacement:
- Node is a
listItem. - List item has a
paragraphchild. - First child of the paragraph is
strong. - All children of the
strongnode aretextorinlineCodenodes. - Only the first bold in the item is replaced (not all bold text).
- Inner inline code backticks are stripped (values concatenated into one string).
Bug: bold with inline code was not converted
The first version required strong to have exactly one text child:
// BUG: only matched strong with a single text child
if (first.children.length === 1 && first.children[0].type === 'text') {
When bold contained inline code (e.g., **Use `.remarkrc.mjs`**), the
strong node had multiple children: [text("Use "), inlineCode(".remarkrc.mjs")].
The length === 1 check failed and the bold was left unconverted.
Fix: Check that all children are text or inlineCode (not just one
text), then concatenate all .value properties. Both text and inlineCode
nodes have a .value string property, so .map(c => c.value).join('') works
for both.
boldToHeading Implementation
Converts standalone bold paragraphs into headings. Only matches when bold is the only content of the paragraph — bold mixed with other text is left alone.
The heading depth is context-aware: the plugin walks root's children in order,
tracks the most recent real heading's depth, and creates the new heading at
lastDepth + 1 (capped at 6).
AST before: **Prerequisites** (after a ### heading)
root -> paragraph -> [strong -> [text("Prerequisites")]]
AST after: #### Prerequisites
root -> heading (depth: 4) -> [text("Prerequisites")]
function boldToHeading() {
return (tree) => {
// This transform only makes sense at the document root.
if (tree.type !== 'root' || !tree.children) return
// Track the last real heading so new headings can be one level deeper.
let lastDepth = 1
for (let i = 0; i < tree.children.length; i++) {
const node = tree.children[i]
if (node.type === 'heading') {
// Existing headings update the depth context.
lastDepth = node.depth
continue
}
if (
node.type === 'paragraph' &&
node.children &&
node.children.length === 1 &&
node.children[0].type === 'strong' &&
node.children[0].children &&
node.children[0].children.length === 1 &&
node.children[0].children[0].type === 'text'
) {
// Convert standalone bold into a heading one level below context.
const depth = Math.min(lastDepth + 1, 6)
tree.children[i] = {
type: 'heading',
depth,
children: [{ type: 'text', value: node.children[0].children[0].value }]
}
}
}
}
}
Conditions for replacement:
- Node is a
paragraphthat is a direct child ofroot. - Paragraph has exactly one child of type
strong. - The
stronghas exactly one child of typetext. - Bold with surrounding text (e.g.,
**bold** and more) is NOT converted. - Bold inside lists, blockquotes, etc. is NOT converted (only root children).
- Matches both numbered (
**3. Steps**) and non-numbered (**Prerequisites**).
formatTables Implementation
Formats markdown tables so all columns are padded to equal width with aligned pipes. Unlike the other plugins, this does NOT operate on the AST — it wraps the compiler and post-processes the serialized markdown string.
Why not an AST transformer? Without remark-gfm, tables are NOT parsed into
table/tableRow/tableCell AST nodes. They remain as paragraph nodes
whose children contain raw pipe text (mixed with inlineCode nodes for
backticks in cells). Modifying the AST and replacing children with a single
text node causes remark-stringify to escape backticks (\``) and asterisks (*`). The compiler wrapper avoids this by operating after serialization.
Plugin type: Compiler wrapper (not a tree transformer). Uses this to
access the processor and wrap self.compiler. Does NOT return a transformer
function.
function formatTables() {
// Wrap the compiler because table formatting is easier on the final string.
const self = this
const orig = self.compiler
self.compiler = function(tree, file) {
// First let remark serialize the AST normally.
const result = orig.call(this, tree, file)
// Then find table-shaped blocks and rewrite them with aligned columns.
return result.replace(
/^(\|[^\n]+\n\|[\s:|\-]+\n(?:\|[^\n]+\n?)*)/gm,
(match) => formatTableText(match)
)
}
}
The regex finds table blocks in the serialized output:
^(\|[^\n]+\n\|[\s:|\-]+\n(?:\|[^\n]+\n?)*) matches lines starting with
| followed by a delimiter row (| --- | --- |) and subsequent data rows.
Helper functions:
function formatTableText(tableText) {
// Break the serialized table into rows and cells.
const lines = tableText.trimEnd().split('\n')
const rows = lines.map(parseCells)
// Size each column to the widest non-delimiter cell.
const colCount = Math.max(...rows.map(r => r.length))
const colWidths = new Array(colCount).fill(0)
for (let r = 0; r < rows.length; r++) {
if (r === 1) continue // skip delimiter row
for (let c = 0; c < rows[r].length; c++) {
colWidths[c] = Math.max(colWidths[c], rows[r][c].length)
}
}
for (let c = 0; c < colCount; c++) {
colWidths[c] = Math.max(colWidths[c], 3) // min width 3 for ---
}
// Rebuild each row using the computed widths.
return rows.map((cells, r) => {
const padded = []
for (let c = 0; c < colCount; c++) {
const cell = cells[c] || ''
if (r === 1) {
// Delimiter rows use dashes and optional alignment colons.
padded.push(buildDelimCell(cell, colWidths[c]))
} else {
// Content rows are padded with spaces to column width.
padded.push(cell + ' '.repeat(colWidths[c] - cell.length))
}
}
return '| ' + padded.join(' | ') + ' |'
}).join('\n') + '\n'
}
function parseCells(line) {
// Strip outer pipes so splitting is easier.
let s = line.trim()
if (s.startsWith('|')) s = s.slice(1)
if (s.endsWith('|')) s = s.slice(0, -1)
return s.split('|').map(c => c.trim())
}
function buildDelimCell(cell, width) {
// Preserve left/right alignment markers from the original delimiter cell.
const left = cell.startsWith(':')
const right = cell.endsWith(':')
const innerWidth = width - (left ? 1 : 0) - (right ? 1 : 0)
return (left ? ':' : '') + '-'.repeat(innerWidth) + (right ? ':' : '')
}
Algorithm:
- Split table text into rows, parse each row into cells by splitting on
|. - Compute max width per column (skipping the delimiter row).
- Pad content cells with trailing spaces to column width.
- Rebuild delimiter cells with correct number of dashes, preserving
:alignment markers. - Reassemble rows with
| cell | cell |format.
Preserves: column alignment (:---:, :---, ---:), inline code, emphasis,
and all other markdown syntax in cells.
Known limitation: tables are paragraph nodes in the AST. Without
remark-gfm, the headingJoin function sees tables as paragraphs and returns 0
(no blank line between heading and table). This is a known issue — see
headingJoin and tables.
Bug: formatTables also formatted tables inside code fences
The problem: The table-matching regex ran on the entire serialized output
string, including content inside fenced code blocks (``` and ~~~). Any
table-like text in a code fence would get reformatted — padding cells, aligning
pipes — destroying the original code example.
Why not reuse looksLikeTable?looksLikeTable operates onAST nodes
during tree traversal (used by headingJoin to detect paragraph nodes that are
secretly tables). formatTables is a compiler wrapper that runs on the
serialized markdown string after the AST is gone. There are no nodes to
inspect at that point, only text. You could theoretically move table formatting
to an AST transformer that uses looksLikeTable, but modifying paragraph > text node values to pad cells would cause remark-stringify to re-escape the
content through state.safe() — which is exactly why formatTables is a
compiler wrapper in the first place.
Fix: Split the serialized output by code fences before running the table regex. The split regex captures fenced code blocks as alternating segments — odd indices are code fence content (left untouched), even indices are regular markdown (table formatting applied).
self.compiler = function(tree, file) {
const result = orig.call(this, tree, file)
// Split by code fences — only format tables outside fences.
const parts = result.split(
/(^(?:`{3,}|~{3,}).*\n[\s\S]*?^(?:`{3,}|~{3,})\s*$)/gm
)
return parts.map((part, i) => {
if (i % 2 === 1) return part // code fence — leave untouched
return part.replace( // regular content — format tables
/^(\|[^\n]+\n\|[\s:|\-]+\n(?:\|[^\n]+\n?)*)/gm,
(match) => formatTableText(match)
)
}).join('')
}
preserveFrontmatter Implementation
Preserves YAML frontmatter (---...---) by stripping it before remark parses
the document and re-adding it to the serialized output.
The problem: Without remark-frontmatter (an npm package), remark doesn't
recognize YAML frontmatter. It parses --- as thematicBreak (horizontal
rule, serialized as ***), and YAML content with - items as list items
(converted to * items by bullet: '*'). The entire frontmatter block gets
destroyed.
Plugin type: Parser + compiler wrapper. Uses this to wrap both
self.parser and self.compiler.
function preserveFrontmatter() {
// `this` is the current remark processor instance.
// We keep a reference so we can wrap its parser and compiler.
const self = this
// Save the original parser so our wrapper can delegate to normal parsing
// after stripping frontmatter.
const origParser = self.parser
// Holds the exact `--- ... ---` block for the current document so we can
// prepend it back unchanged after compilation.
let savedFrontmatter = null
self.parser = function(doc, file) {
// `doc` is the raw markdown source before remark has parsed anything.
const str = String(doc)
const match = str.match(/^---\n([\s\S]*?)\n---\n/)
if (match) {
// Save the full frontmatter block exactly as written, including fences.
savedFrontmatter = match[0]
// Parse only the markdown body. remark never sees the frontmatter, so it
// cannot reinterpret `---` as a thematic break or YAML `- item` lines as
// markdown list items.
return origParser(str.slice(match[0].length), file)
}
// No frontmatter in this file, so clear any previous saved value.
savedFrontmatter = null
// Fall back to normal parsing of the whole document.
return origParser(doc, file)
}
// Save the original compiler so our wrapper can let remark serialize the AST
// normally, then restore the stripped frontmatter afterward.
const origCompiler = self.compiler
self.compiler = function(tree, file) {
// `tree` is the parsed AST after all transforms have run.
const result = origCompiler.call(this, tree, file)
// Reattach the exact saved frontmatter at the very end.
return savedFrontmatter ? savedFrontmatter + result : result
}
}
How it works:
Parser wrapper:Before remark parses, checks if the document starts with---\n...\n---\n. If found, saves the entire frontmatter block and strips it from the input so remark only sees the markdown content.Compiler wrapper:After remark serializes, prepends the saved frontmatter to the output.
Must be the first plugin in the plugins array so its compiler wrapper is the
outermost one (runs after all other compiler wrappers like formatTables).
Bug: wrapping self.parse vs self.parser
The processor has two levels:
self.parse— prototype method on the processor. Callsself.parserinternally.self.parser— own property on the processor instance. The actual parse function.
Same pattern for serialization: self.stringify (method) calls self.compiler
(own property).
Wrapping self.parse (the prototype method) works in standalone remark but
fails in vscode-remark. The language server likely clones or reconstructs
the processor between operations, copying own properties but not prototype
overrides.
Fix: Always wrap self.parser and self.compiler (own properties), never
self.parse or self.stringify (prototype methods). This is consistent with
how formatTables already wraps self.compiler successfully.
Thematic Break Normalization (rule + ruleRepetition)
The problem: Remark normalizes all thematic breaks (---, ***, * * *,
----------, etc.) into a single thematicBreak AST node with no memory of the
original characters, count, or spacing. On output, remark-stringify recreates the
break using the rule setting (default '*') repeated ruleRepetition times
(default 3). So ---------- in the source becomes *** in the output.
Solution 1 (tried, discarded): preserveThematicBreaks plugin. A parser
wrapper that scans the input for thematic break lines (using the same regex
remark uses: ^([-*_])[ \t]*(?:\1[ \t]*){2,}$), collects them in order, then
annotates each thematicBreak AST node with node.data.original. A custom
thematicBreak handler outputs node.data.original instead of the default.
This preserved each break's exact original text. Required frontmatter exclusion
from the scan (to avoid --- fences being miscounted). The plugin had to be
registered after preserveFrontmatter so its parser wrapper wrapped around
preserveFrontmatter's wrapper — it would see the full document for scanning
but get the frontmatter-free tree for annotation.
Solution 2 (chosen): rule: '-' + ruleRepetition: 10. Two built-in
remark-stringify settings: rule controls the character (-, *, or _) and
ruleRepetition controls how many times it's repeated. Setting rule: '-' and
ruleRepetition: 10 makes all thematic breaks output as ----------.
Why Solution 2 was chosen: The user has a keyboard shortcut that inserts
10 dashes as a horizontal rule, so normalizing everything to ---------- is the
desired behavior. This avoids the complexity of preserveThematicBreaks (parser
wrapper, frontmatter exclusion, custom handler, plugin ordering) for no benefit.
Frontmatter interaction: rule and ruleRepetition do NOT affect YAML
frontmatter fences. preserveFrontmatter strips the --- fences before remark
ever parses the document, so the serializer never sees them. The frontmatter
regex matches exactly ^---\n (3 dashes at the start of the file), so
---------- in the body is never confused for frontmatter.
Preserving Content with Placeholders (preserveContent Factory)
The problem: Remark's parser (micromark) destroys certain content that isn't
standard markdown. Hugo shortcodes like {{< xref ... >}} lose their leading
indentation (1-3 leading spaces are "insignificant" per the CommonMark spec and
are stripped during tokenization), and characters like <, {, > can be
escaped or misinterpreted. This happens in the parser, not the serializer —
there is no remark setting to disable it.
The technique: Replace matching content with safe alphanumeric placeholders
before remark parses the document, then restore the originals after
serialization. Same pattern as preserveFrontmatter.
preserveContent(pattern, label) factory: A generic version of this
technique. Takes two parameters:
pattern— RegExp with the global flag (g) matching content to preserve. Must use[\s\S]*?(not.) for multi-line matching.label— Alphanumeric string for unique placeholder names. Placeholders look likeREMARK<label><index>END(e.g.,REMARKSHORTCODE0END). Must be unique across allpreserveContentinstances to avoid collisions.
Placeholders are intentionally alphanumeric-only. Using underscores
(REMARK_SHORTCODE_0) would cause remark's default text handler to escape them
as REMARK\_SHORTCODE\_0. Our custom textHandler disables escaping, but
alphanumeric placeholders make the plugin robust regardless of handler config.
// Factory: preserveContent(pattern, label) returns a plugin function.
function preserveContent(pattern, label) {
return function() {
// Wrap parser and compiler so protected content survives both phases.
const self = this
const origParser = self.parser
const saved = []
self.parser = function(doc, file) {
const str = String(doc)
// Replace each matched block with a safe placeholder and remember it.
const cleaned = str.replace(pattern, (match) => {
const idx = saved.length
saved.push(match)
return `REMARK${label}${idx}END`
})
// Parse the placeholder version so remark does not touch the original.
return origParser(cleaned, file)
}
const origCompiler = self.compiler
self.compiler = function(tree, file) {
// Compile the placeholder version first.
let result = origCompiler.call(this, tree, file)
// Restore the original content byte-for-byte.
for (let i = 0; i < saved.length; i++) {
result = result.replace(`REMARK${label}${i}END`, saved[i])
}
return result
}
}
}
Usage in the plugins array:
plugins: [
// Frontmatter must run first so later plugins see the body only.
preserveFrontmatter,
// Hugo shortcodes: {{< ... >}} and {{-/* ... */-}}
preserveContent(/\{\{<[\s\S]*?>}}|\{\{-[\s\S]*?-}}/g, 'SHORTCODE'),
// Could add more patterns with different labels:
// preserveContent(/<!-- raw -->[\s\S]*?<!-- \/raw -->/g, 'RAWBLOCK'),
]
All leading spaces, line breaks, and special characters inside the shortcode are preserved exactly as written.
Why not configure the parser? The space stripping happens in micromark (the
tokenizer underneath remark-parse). It implements the CommonMark spec where 1-3
leading spaces in paragraphs are insignificant. There is no
preserveWhitespace option. 4+ spaces trigger indented code block syntax.
Writing a micromark extension to change this would be significantly more complex
and would be fighting the markdown spec.
Plugin ordering: preserveContent instances should be registered after
preserveFrontmatter so frontmatter is already stripped. Multiple
preserveContent instances can be used safely — each has its own saved array
and unique placeholder namespace from the label parameter.
Why preserveFrontmatter is not built on preserveContent
preserveFrontmatter stays as a separate plugin because it has three behaviors
that don't fit the factory:
Position-specific match.Frontmatter must match only at^(start of file) and only once.preserveContentis designed for global regex (gflag) matching anywhere in the document.Extra blank line.preserveFrontmatteradds an extra\nbetween the closing---and the first content line.preserveContentdoes straight replacement with no whitespace adjustment.Removal vs. replacement.preserveFrontmatterremoves the frontmatter entirely so remark never sees it.preserveContentreplaces with a placeholder that remark parses as text inside a paragraph node. Both approaches work, but frontmatter is a file header at byte 0, not content at an arbitrary position.
Adding options like { global: false, prefix: '\n' } to the factory would
handle these cases but would add frontmatter-specific logic to a generic
function for a single use case.
Disabling Escaping with Custom Handlers
Remark-stringify calls state.safe() on text content, link URLs, and definition
URLs, which escapes ~20 ASCII punctuation characters ([, (, <, _, &,
!, #, etc.) to prevent them from being reinterpreted as markdown syntax.
This breaks:
- Placeholder brackets:
[author(s)]becomes\[author(s)] - Underscores in text:
some_variable_namebecomessome\_variable\_name - Ampersands:
this & thatbecomesthis \& that - Parens in URLs:
https://en.wikipedia.org/wiki/Freehold_(novel)becomesFreehold_\(novel\)in both inline links and reference definitions
Why there's no config option to disable escaping
The unsafe option in mdast-util-to-markdown only appends to the
defaults. The configure() function in
mdast-util-to-markdown/lib/configure.js always pushes:
// from mdast-util-to-markdown/lib/configure.js
case 'unsafe': {
// User-provided `unsafe` entries are appended to the built-in defaults.
list(base[key], extension[key]) // always appends, never replaces
break
}
function list(left, right) {
if (right) {
// This push is why there is no replace/remove behavior.
left.push(...right) // pushes onto existing defaults
}
}
The defaults are spread into state.unsafe in index.js, then user config is
pushed on top. There is no way to remove or replace the defaults.
The fix: custom handlers
Instead of stripping escapes after serialization (the old removeEscapes
compiler wrapper approach), override the handlers that call state.safe() to
skip it entirely. Three handlers cover all escaping:
text— plain text content. Default callsstate.safe(). Override returnsnode.valuedirectly.link— inline link URLs[text](url). Default callsstate.safe()onnode.url. Override outputsnode.urldirectly.definition— reference definitions[label]: url. Default callsstate.safe()onnode.url. Override outputsnode.urldirectly.
This is safe because remark already parsed real markdown syntax (links, emphasis, etc.) into their own AST nodes. Text nodes and URL properties contain only literal values — no markdown syntax that needs protecting.
// Return text content directly without escaping.
function textHandler(node) {
// Text is already parsed, so emit its literal value.
return node.value
}
// Serialize inline links without escaping the URL.
function linkHandler(node, _, state, info) {
// Enter normal link serialization state so surrounding syntax stays correct.
const exit = state.enter('link')
const subexit = state.enter('label')
const tracker = state.createTracker(info)
// Build the `[label](` prefix and serialize the label content normally.
let value = tracker.move('[')
value += tracker.move(
state.containerPhrasing(node, {
before: value,
after: '](',
...tracker.current()
})
)
value += tracker.move('](')
subexit()
// Insert the raw URL directly instead of calling `state.safe()`.
value += tracker.move(node.url || '')
if (node.title) {
value += tracker.move(' "' + node.title + '"')
}
// Close the link and exit the serializer state.
value += tracker.move(')')
exit()
return value
}
// Serialize reference definitions without escaping the URL.
function definitionHandler(node, _, state) {
// Use the normal definition state so association IDs stay consistent.
const exit = state.enter('definition')
const subexit = state.enter('label')
const id = state.associationId(node)
subexit()
// Emit the raw URL directly.
let value = '[' + id + ']: ' + (node.url || '')
if (node.title) {
value += ' "' + node.title + '"'
}
exit()
return value
}
Registered in settings:
settings: {
handlers: {
// Override only the handlers that escape content we want to preserve.
text: textHandler,
link: linkHandler,
definition: definitionHandler
}
}
Evolution: removeEscapes -> custom handlers
The first approach was a removeEscapes compiler wrapper that stripped \
before specific characters ([, (, <, _) from the serialized output using
a regex. This had two problems:
- Playing whack-a-mole: every new escaped character (
&,!, etc.) needed adding to the regex. - Stripping ALL ASCII punctuation escapes was unsafe — some escapes (like
\#at a line start) are semantically meaningful.
The handler approach is cleaner: it disables escaping at the source rather than
fixing it after the fact, and it's selective — only text, link, and
definition handlers are overridden. Other handlers (headings, emphasis, code,
etc.) keep their default behavior.
Bug: cascading depth escalation
The first version updated lastDepth after each conversion:
// BUG: this caused cascading depth
tree.children[i] = { type: 'heading', depth, children: [...] }
lastDepth = depth // <-- THIS LINE WAS THE BUG
This meant consecutive bold paragraphs under the same real heading would escalate in depth:
### Section <!-- lastDepth = 3 -->
**First bold** <!-- became ####, then lastDepth = 4 -->
**Second bold** <!-- became #####, then lastDepth = 5 -->
**Third bold** <!-- became ######, then lastDepth = 6 -->
The fix: only update lastDepth from headings that already existed in the
document, not from converted ones. Remove the lastDepth = depth line after
conversion. Now all sibling bolds under the same real heading get the same
depth:
### Section <!-- lastDepth = 3 -->
**First bold** <!-- becomes #### -->
**Second bold** <!-- becomes #### -->
**Third bold** <!-- becomes #### -->
Plugin Loading Failures (What Doesn't Work)
.remarkrc.jsonwith"plugins": ["./path/to/plugin.js"]- fails with "Cannot parse file" because the bundledload-pluginuses ESMimport()and withoutpackage.json,.jsfiles are treated as CJS.- Separate
.mjsplugin files referenced from config - loads without error but plugin doesn't apply (silent failure). - Using
require()in.remarkrc.mjs- not available in ESM context.
Solution: define all plugin functions inline in .remarkrc.mjs
Current Config
// Preserves YAML frontmatter by stripping it before parse and re-adding after
// compile. Must be FIRST in plugins array.
function preserveFrontmatter() {
const self = this
const origParser = self.parser
let savedFrontmatter = null
self.parser = function(doc, file) {
const str = String(doc)
const match = str.match(/^---\n([\s\S]*?)\n---\n/)
if (match) {
savedFrontmatter = match[0]
// Strip frontmatter before parsing the body.
return origParser(str.slice(match[0].length), file)
}
savedFrontmatter = null
return origParser(doc, file)
}
const origCompiler = self.compiler
self.compiler = function(tree, file) {
const result = origCompiler.call(this, tree, file)
// Reattach saved frontmatter after compilation.
return savedFrontmatter ? savedFrontmatter + result : result
}
}
// Factory: preserves content matching a regex by replacing with placeholders
// before parsing and restoring after serialization.
// pattern: RegExp with global flag, label: unique alphanumeric string.
function preserveContent(pattern, label) {
return function() {
const self = this
const origParser = self.parser
const saved = []
self.parser = function(doc, file) {
const str = String(doc)
const cleaned = str.replace(pattern, (match) => {
const idx = saved.length
saved.push(match)
return `REMARK${label}${idx}END`
})
return origParser(cleaned, file)
}
const origCompiler = self.compiler
self.compiler = function(tree, file) {
let result = origCompiler.call(this, tree, file)
for (let i = 0; i < saved.length; i++) {
result = result.replace(`REMARK${label}${i}END`, saved[i])
}
return result
}
}
}
// Factory that returns a remark plugin to replace text patterns in the final
// serialized markdown.
// `mappings` is an object of { 'from': 'to' } pairs.
function replaceText(mappings) {
return function() {
const self = this
const origCompiler = self.compiler
self.compiler = function(tree, file) {
let result = origCompiler.call(this, tree, file)
for (const [from, to] of Object.entries(mappings)) {
result = result.split(from).join(to)
}
return result
}
}
}
function headingJoin(left, right) {
if (left.type === 'heading') {
if (right.type === 'paragraph' && !looksLikeTable(right)) return 0
return 1
}
}
function looksLikeTable(node) {
if (!node.children) return false
const raw = node.children.map(c => {
if (c.type === 'text') return c.value
if (c.type === 'inlineCode') return '`' + c.value + '`'
return ''
}).join('')
const lines = raw.split('\n')
if (lines.length < 2) return false
const cells = parseCells(lines[1])
return cells.length > 0 && cells.every(c => /^:?-+:?$/.test(c))
}
function boldToCodeInLists() {
return (tree) => {
function visit(node) {
if (node.type === 'listItem') {
for (const child of (node.children || [])) {
if (child.type === 'paragraph' && child.children && child.children.length > 0) {
const first = child.children[0]
if (first.type === 'strong' && first.children && first.children.length > 0) {
const allTextOrCode = first.children.every(
c => c.type === 'text' || c.type === 'inlineCode'
)
if (allTextOrCode) {
const value = first.children.map(c => c.value).join('')
child.children[0] = { type: 'inlineCode', value }
}
}
}
}
}
if (node.children) {
node.children.forEach(visit)
}
}
visit(tree)
}
}
function boldToHeading() {
return (tree) => {
if (tree.type !== 'root' || !tree.children) return
let lastDepth = 1
for (let i = 0; i < tree.children.length; i++) {
const node = tree.children[i]
if (node.type === 'heading') {
lastDepth = node.depth
continue
}
if (
node.type === 'paragraph' &&
node.children &&
node.children.length === 1 &&
node.children[0].type === 'strong' &&
node.children[0].children &&
node.children[0].children.length === 1 &&
node.children[0].children[0].type === 'text'
) {
const depth = Math.min(lastDepth + 1, 6)
tree.children[i] = {
type: 'heading',
depth,
children: [{ type: 'text', value: node.children[0].children[0].value }]
}
}
}
}
}
function formatTables() {
const self = this
const orig = self.compiler
self.compiler = function(tree, file) {
const result = orig.call(this, tree, file)
return result.replace(
/^(\|[^\n]+\n\|[\s:|\-]+\n(?:\|[^\n]+\n?)*)/gm,
(match) => formatTableText(match)
)
}
}
function formatTableText(tableText) {
const lines = tableText.trimEnd().split('\n')
const rows = lines.map(parseCells)
const colCount = Math.max(...rows.map(r => r.length))
const colWidths = new Array(colCount).fill(0)
for (let r = 0; r < rows.length; r++) {
if (r === 1) continue
for (let c = 0; c < rows[r].length; c++) {
colWidths[c] = Math.max(colWidths[c], rows[r][c].length)
}
}
for (let c = 0; c < colCount; c++) {
colWidths[c] = Math.max(colWidths[c], 3)
}
return rows.map((cells, r) => {
const padded = []
for (let c = 0; c < colCount; c++) {
const cell = cells[c] || ''
if (r === 1) {
padded.push(buildDelimCell(cell, colWidths[c]))
} else {
padded.push(cell + ' '.repeat(colWidths[c] - cell.length))
}
}
return '| ' + padded.join(' | ') + ' |'
}).join('\n') + '\n'
}
function parseCells(line) {
let s = line.trim()
if (s.startsWith('|')) s = s.slice(1)
if (s.endsWith('|')) s = s.slice(0, -1)
return s.split('|').map(c => c.trim())
}
function buildDelimCell(cell, width) {
const left = cell.startsWith(':')
const right = cell.endsWith(':')
const innerWidth = width - (left ? 1 : 0) - (right ? 1 : 0)
return (left ? ':' : '') + '-'.repeat(innerWidth) + (right ? ':' : '')
}
function textHandler(node) {
return node.value
}
function linkHandler(node, _, state, info) {
const exit = state.enter('link')
const subexit = state.enter('label')
const tracker = state.createTracker(info)
let value = tracker.move('[')
value += tracker.move(
state.containerPhrasing(node, {
before: value,
after: '](',
...tracker.current()
})
)
value += tracker.move('](')
subexit()
value += tracker.move(node.url || '')
if (node.title) {
value += tracker.move(' "' + node.title + '"')
}
value += tracker.move(')')
exit()
return value
}
function definitionHandler(node, _, state) {
const exit = state.enter('definition')
const subexit = state.enter('label')
const id = state.associationId(node)
subexit()
let value = '[' + id + ']: ' + (node.url || '')
if (node.title) {
value += ' "' + node.title + '"'
}
exit()
return value
}
export default {
settings: {
bullet: '*',
listItemIndent: 'one',
tightDefinitions: true,
rule: '-',
ruleRepetition: 10,
join: [headingJoin],
handlers: {
text: textHandler,
link: linkHandler,
definition: definitionHandler
}
},
plugins: [
preserveFrontmatter,
preserveContent(/\{\{<[\s\S]*?>}}|\{\{-[\s\S]*?-}}/g, 'SHORTCODE'),
boldToCodeInLists,
boldToHeading,
formatTables,
replaceText({ '→': '->', '←': '<-' })
]
}
Debugging Tips
- Test remark behavior standalone with a local install in
/tmp:mkdir /tmp/remark-test && cd /tmp/remark-test echo '{"type":"module"}' > package.json npm install unified remark-parse remark-stringify - Then run test scripts with
node test.mjsto verify settings/plugins work before putting them in the vscode config. - The extension source is at
~/.vscode-server/extensions/unifiedjs.vscode-remark-3.2.0/out/remark-language-server.js(minified, but searchable). - After every change to
.remarkrc.mjs, reload the VS Code window, then save the test file to verify.
Implementation Workflow
- Write the plugin function inline in
.remarkrc.mjs. - Add it to
plugins: []array (for tree transformers) orsettings.join(for blank line control). - Reload VS Code window (Ctrl+Shift+P -> "Developer: Reload Window").
- Open and save
test-markdown/formatting-issues.md. - Verify the formatting change was applied.
- If not working, test standalone in
/tmp/remark-test/withnode test.mjsto isolate whether the issue is the plugin logic or the vscode-remark integration.
Lessons Learned
Reload the window after every config change.The language server caches.remarkrc.mjsvia Node's ES module cache. No reload = old config. See Window Reload Required.All plugins must be inline in .remarkrc.mjs.Separate plugin files fail silently or throw errors because the bundledload-plugincan't resolve local paths withoutpackage.json. See Plugin Loading Failures.Use .remarkrc.mjs, not .remarkrc.json.JSON configs can't contain functions (needed forjoinand plugins). ESM config loaded viaimport()preserves functions. See Config File Structure.join is the only way to control blank lines between blocks.There's no boolean setting. remark-stringify adds\n\nbetween all blocks by default. ThetightDefinitionssetting only covers definition lists. See Join Functions.Whitelist removal, not blacklist.When writing join functions, check for the specific type you want to change (e.g.,paragraph) and return the default for everything else. This prevents new block types (tables, blockquotes, HTML) from needing updates. See headingJoin Implementation.Plugins must return a transformer function.A plugin that only sets up data on the processor (e.g.,this.data()) returns nothing. A plugin that modifies the tree mustreturn (tree) => { ... }. See Tree Transformer Plugins.listItemIndent takes the string 'one', not the number 1.Easy mistake that silently produces wrong output. See Settings.Don't update tracking state from converted nodes.TheboldToHeadingplugin initially updatedlastDepthafter each conversion, causing consecutive bolds to cascade (####->#####->######). Fix: only update depth from pre-existing headings. See Bug: cascading depth escalation.AST node replacement is in-place.Assign directly toparent.children[i]to replace a node. No need to splice or rebuild arrays. See boldToCodeInLists and boldToHeading.Test standalone before testing in vscode-remark.The extension's language server adds layers of complexity (config loading, module caching, processor cloning). Isolate plugin logic in/tmp/remark-test/first. See Debugging Tips.remark already handles some rules by default.Blank lines between text and lists, fenced code blocks, and ATX headings are all default behavior. Check before writing a plugin. See Rule-to-Setting Mapping.listItem is shared by ordered and unordered lists.The list type is on the parentlistnode, not on items. Plugins targetinglistItemwork for both list types automatically. See boldToCodeInLists.Only match bold that is the sole paragraph content.BothboldToCodeInLists(first child of list paragraph) andboldToHeading(only child of root paragraph) avoid converting legitimate emphasis that appears alongside other text.Without remark-gfm, tables are paragraph nodes.They are NOT parsed intotable/tableRow/tableCellAST nodes. This affects table formatting (can't use a tree transformer) and join functions (tables look like paragraphs). Fix: inspect the paragraph content for a delimiter row to detect tables. See formatTables and headingJoin and tables.Compiler wrappers are a different plugin pattern.Instead of returning(tree) => {...}, the plugin usesthisto access the processor and wrapsself.compiler. This allows post-processing the serialized markdown string, avoiding AST escaping issues. See formatTables.remark-stringify escapes markdown syntax in text nodes.If you replace mixed AST children (text + inlineCode) with a singletextnode containing backticks, remark will escape them to ```. Operate on the serialized string instead when the content contains markdown syntax.Without remark-frontmatter, frontmatter is destroyed.Remark parses---asthematicBreak(***) and YAML list items as markdown lists. Fix: strip frontmatter before parsing, re-add after serialization. See preserveFrontmatter.Wrap self.parser/self.compiler, not self.parse/self.stringify.The processor has own properties (parser,compiler) and prototype methods (parse,stringify). Wrapping the prototype methods works in standalone remark but fails in vscode-remark — the language server copies own properties but not prototype overrides. Always wrap the own properties. See Bug: wrapping self.parse vs self.parser.remark-stringify escapes ~20 ASCII punctuation characters in text and URLs.Theunsafepatterns inmdast-util-to-markdownare hardcoded defaults. Theunsafeconfig option only appends —configure()always pushes, never replaces. Fix: override thetext,link, anddefinitionhandlers insettings.handlersto skipstate.safe(). This is cleaner than the earlierremoveEscapescompiler wrapper approach. See Disabling Escaping with Custom Handlers.tightDefinitions: true removes blank lines between reference link definitions.Built-inremark-stringifysetting. Equivalent to a join function returning0for two adjacentdefinitionnodes.Custom handlers in settings override how specific AST nodes are serialized.Each key is a node type (text,link,definition, etc.), each value is a function(node, parent, state, info) -> string. The default handlers callstate.safe()for escaping; custom handlers can skip it. Handlers are passed viasettings.handlersin.remarkrc.mjs.Bold with inline code inside has multiple AST children.**Use.remarkrc.mjs**is parsed asstrong -> [text("Use "), inlineCode(".remarkrc.mjs")], not a single text child. To convert it to inline code, check that all children aretextorinlineCode, then concatenate all.valueproperties into one string. Both node types have.value. See Bug: bold with inline code was not converted.Remark normalizes thematic breaks — use rule + ruleRepetition to control output.All thematic breaks (---,***,* * *,----------) become a singlethematicBreakAST node with no memory of the original.rule: '-'sets the character,ruleRepetition: 10sets the count. ApreserveThematicBreaksparser wrapper was built and tested (annotates AST nodes with originals vianode.data.original) but discarded in favor of the simpler settings approach since normalizing to----------was the desired behavior. Therulesetting does NOT affect frontmatter —preserveFrontmatterstrips it before remark parses. See Thematic Break Normalization.Use the preserveContent placeholder pattern to protect content remark destroys.Remark's parser strips 1-3 leading spaces (CommonMark spec) and mangles non-markdown syntax like Hugo shortcodes. No parser setting to disable this. Fix: replace matching content with alphanumeric placeholders before parsing, restore after serialization. ThepreserveContent(pattern, label)factory generalizes this — pass any regex and a unique label. Placeholders must be alphanumeric only to avoid remark escaping underscores. See Preserving Content with Placeholders.
Bulk Processing with remark-cli
The .remarkrc.mjs config is self-contained (all plugins inline, no external
npm dependencies) but it's just instructions — it doesn't include the markdown
parser, serializer, or file I/O engine. remark-cli bundles all of that
(unified + remark-parse + remark-stringify + unified-engine for file
handling and config loading). It's the same runtime vscode-remark uses
internally, so behavior is identical.
Install
# Local install (more reliable)
mkdir ~/remark-tool && cd ~/remark-tool
echo '{"type":"module"}' > package.json
npm install remark-cli
# Or global
npm install -g remark-cli
Usage
# Format files in-place
remark --output *.md
# Format all markdown files recursively
remark --output .
# Specific files
remark --output file1.md file2.md docs/
# Dry run (print to stdout, don't modify)
remark file.md
# Point to a specific config file
remark --rc-path /path/to/.remarkrc.mjs --output .
remark-cli uses the same findUp config loading as vscode-remark. Without
--rc-path, it walks up the directory tree from each file looking for
.remarkrc.mjs. With --rc-path, it uses the specified config for all files.
This is useful for bulk-formatting an entire repo or running formatting in CI.
Alternative: Using Installed Plugins (remark-gfm, remark-frontmatter)
What Changes
Installing remark-gfm and remark-frontmatter removes the need for our
custom formatTables, formatTableText, parseCells, buildDelimCell,
looksLikeTable, and preserveFrontmatter functions (~125 lines). The table
detection hack in headingJoin also goes away since tables become proper AST
nodes.
What stays regardless: replaceText, boldToCodeInLists, boldToHeading,
preserveContent (shortcodes), custom escape handlers (textHandler,
linkHandler, definitionHandler), and headingJoin (simplified).
Cost Per Repository
| File | Size | Git tracked? |
|---|---|---|
package.json | ~15 lines | Yes |
package-lock.json | ~1,179 lines | Yes |
node_modules/ | ~6 MB, 70 packages | No (.gitignore) |
Anyone cloning the repo must run npm install to restore node_modules/.
Setup Commands
# In the repository root (where .remarkrc.mjs lives)
npm init -y
npm install remark-cli remark-gfm remark-frontmatter
# Add node_modules/ to .gitignore
echo 'node_modules/' >> .gitignore
remark-cli is not required by the VS Code extension, but it is useful for
manual verification and bulk formatting.
Plugin Resolution (How It Works)
The vscode-remark language server uses unified-engine which uses load-plugin
to resolve plugin strings. load-plugin uses Node.js module resolution
(import-meta-resolve) starting from the workspace folder, walking up the
directory tree through node_modules/ directories. This is why package.json
and node_modules/ must be at or above the .remarkrc.mjs location.
Global Install Does Not Work Reliably
The load-plugin package has a global option, but it auto-detects global mode
by checking process.versions.electron or whether argv starts with the npm
global prefix. In VS Code (especially Remote/WSL), the language server runs as a
Node.js child process where neither condition is true, so global node_modules/
is never searched. The .npmrc prefix approach (documented in some guides)
relies on this same detection and is equally unreliable.
Shared Parent node_modules (Alternative)
Installing in a common parent directory (e.g., /home/parsia/dev/) would let
all repos beneath it find the plugins via Node.js upward resolution. But this
puts package.json and node_modules/ in /dev, which is undesirable.
Config File With Installed Plugins
See remarkrc-with-plugins.mjs in this repo for a ready-to-use config that
imports remark-gfm and remark-frontmatter. To use it, rename it to
.remarkrc.mjs after installing the packages.
Verified Working Setup
This approach has now been tested successfully in this workspace with:
.remarkrc.mjsimportingremark-gfmandremark-frontmatter.- Local dependencies installed in
package.json. remark-cliinstalled for local verification.- Test files under
tmp/remark-fixtures/.
Direct processor test succeeded: loading remarkrc.mjs in Node produced the
expected formatting for frontmatter, tables, list-item bold-to-code,
bold-to-heading, and arrow replacement.
CLI validation also succeeded:
npx remark tmp/remark-fixtures/*.md --frail
That means the installed-plugin config is not just theoretical here; it works with the current repo layout.
Decision
Current guidance:
- Use the installed-plugin setup when you want the simpler, more maintainable
.remarkrc.mjsand are willing to keep a small Node setup in the repo. - Use the inline-only setup when avoiding
package.jsonandnode_modules/matters more than config simplicity.
Future Work: Line Wrapping (Rewrap)
Goal
Automatically wrap long lines to 80 columns on save, similar to the Rewrap extension (Alt+Q). Rewrap handles paragraphs, list items (with proper indentation), blockquotes, etc.
remark-stringify Has No Wrapping Support
The mdast-util-to-markdown types contain a comment: "This info isn't used yet
but such functionality will allow line wrapping." There is no width, column,
or wrap option. Wrapping is not implemented.
How Rewrap Works (Reference)
Rewrap's core is written in F# (/core/Wrapping.fs, Parsing.Markdown.fs).
Key design:
Block detection— Parses markdown into block types: paragraphs, list items, blockquotes, headings, code blocks, tables, HTML. Each type has different wrapping rules.Tables, headings, code blocks—NoWrap(never wrapped).Paragraphs— All lines concatenated into one string, then broken at word boundaries to fit within the configured column width.List items— Parsed with their prefix (*,1., etc.). Continuation lines get indentation matching the prefix width (e.g., 2 spaces for*, 3 for1.). Content is recursively parsed (a list item can contain paragraphs, sub-lists, code blocks, etc.).Block quotes—>prefix stripped, content recursively parsed and wrapped, then>re-added.Break position search— Searches backwards from the column limit to find a valid break position. Handles CJK characters (can break between any two CJK chars without spaces).Line concatenation— When joining lines before re-breaking, adds a space between words but not between CJK characters. Optional double-space after sentence-ending punctuation (.,?,!).
Implementation Approach for Remark
Use the same compiler-wrapper pattern as formatTables — post-process the
serialized markdown string after remark-stringify produces it.
function wrapLines() {
// Wrap the compiler so line reflow happens on the final markdown string.
const self = this
const orig = self.compiler
self.compiler = function(tree, file) {
// Let remark serialize the AST first.
const result = orig.call(this, tree, file)
// Then rewrap the serialized markdown to the chosen column width.
return wrapMarkdown(result, 80)
}
}
Block Types and Wrapping Rules
| Block Type | Wrap? | Prefix Handling |
|---|---|---|
| Paragraph | Yes | No prefix, just wrap at column width |
| List item | Yes | First line: * / 1. , continuation: spaces |
| Block quote | Yes | Strip > , wrap, re-add > |
| Heading | No | ATX headings stay on one line |
| Code block | No | Fenced and indented code blocks left as-is |
| Table | No | Tables left as-is (pipes must stay on one line) |
| HTML block | No | HTML left as-is |
| Blank line | No | Preserved as paragraph separators |
Edge Cases to Handle
Don't break URLs— A URL in a paragraph shouldn't be split across lines.Don't break inline code— Backtick spans should stay on one line if possible.List continuation indent— Must match the list marker width: 2 for*, 3 for1., 4 for10., etc.Nested lists— Each nesting level adds its own indent.Nested block quotes—> >for double-nested, etc.Lines ending in \ or <br>— Hard line breaks, don't join with next line.Already short lines— Don't re-wrap lines that are already under the limit unless they're part of a paragraph that could be reflowed.
Detecting the Column Width
The extension has no access to VS Code settings from .remarkrc.mjs. Options:
Hardcode in .remarkrc.mjs— e.g.,wrapMarkdown(result, 80). Simple, must be changed manually if the ruler changes.Read from an environment variable— e.g.,process.env.REMARK_COLUMNSwith a fallback to 80.Read editor.rulers from settings.json— Technically possible viafs.readFileSyncbut fragile and platform-dependent.
Option 1 (hardcoded) is the most practical.
Decision: Not Implemented
We decided not to build this. The Rewrap extension (Alt+Q) already handles line wrapping well and covers all the edge cases listed above. Implementing it in remark would duplicate that work.
The Rewrap extension reads the column width from (in priority order):
rewrap.wrappingColumn, editor.rulers, editor.wordWrapColumn. Our
.remarkrc.mjs runs inside the language server's Node.js process and cannot
access VS Code settings APIs.