Markdown, changing the referencing from exported documents to simpler links, stuck in JS

Question

So I want to do a search and replace in a markdown document exported from a word processor. Basically get rid of the references in favor or simpler in text links for easier updating/changing/adding. While being kramdown compatible.

I'm stuck at this JS which matches correctly but doesn't work.

Here is the markdown:

// content is defined somewhere, let's put it in a "content" variable 
const content = `What is Lorem Ipsum?
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and this<sup>[[Pubmed]](1)</sup> book. This<sup>[[Microsoft](3)</sup> not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. 

The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using something<sup>[[Wikipedia]](2)</sup>. 

[1]: https://pubmed.com
[2]: https://wikipedia.org
[3]: https://microsoft.com
`

Then running

// define our function to extract it 
const extractCitationsFromMarkdown = content => {
  // Array of regexes - first pulls the in-content links (between sup tags) but only for in those 3 types Pubmed|Microsoft|Wikipedia
  // second one for matching the references at the bottom
  const regexes = [
    /\<sup\>\[\[(Pubmed|Microsoft|Wikipedia)\]\]\((\d+)\)<\/sup\>/mg, 
    /\[(\d+)\]: ([^\s]+)/mg
  ]

  // Extract the matches from the text
  const matches = regexes
    .map(re => Array.from(content.matchAll(re)))
    .map(groups => groups.map(g => g.slice(1)))

    // format the results
  return matches.at(0)
    .map(([reference, referenceNumber ]) => 
      ([
        `[${reference}]`,
        matches.at(1).find(group => group.includes(referenceNumber)).at(1),
      ]).join(': ')
    ).toString(/\n/)
}

Calling it:

extractCitationsFromMarkdown(content)

Expected MD result:

What is Lorem Ipsum? Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and this<sup>[[Pubmed]](https://pubmed.com)</sup> book. This<sup>[[Microsoft](https://microsoft.com)</sup> not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. 

The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using something<sup>[[Wikipedia]](https://wikipedia.org)</sup>.

Expected rendered result:

What is Lorem Ipsum? Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and this ^[Pubmed] book. This ^{[microsoft.com]} not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using something ^[Wikipedia] .

Any help would be much appreciated, been stuck on this for > 1 day.

Thank you

Answer 1

Your function returns a string with matches, but does not perform any replacement.

Note that you have a missing closing square bracket in the input; here:

<sup>[[Microsoft](https://microsoft.com)</sup> 
                ^

I would propose a function that performs a search and replace, following this procedure:

Read all the numerical references in the text and collect them in a map keyed by id
Iterate all the entries in the footer section, and if they correspond to an id collected in the first step, register the url that goes with it and delete that line from the footer section. If not, don't touch it.
Finally replace all the inline references with the urls that were collected in the previous step.

 function makeCitationsInline(content) { const regex = /(\<sup\>\[\[.*?\]\]\()(\d+)(\)<\/sup\>)/g; // Collect the references that are used in the text const legend = Object.fromEntries( Array.from(content.matchAll(regex), m => [m[2], m[2]]) ); // Extract those references from the footer return content.replace(/\[(\d+)\]: ([^\s]+)\s*/g, (m, i, url) => (legend[i] &&= url) ? "" : m // Insert them inline ).replace(regex, (_, pre, i, post) => pre + legend[i] + post); } const content = `What is Lorem Ipsum? Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and this<sup>[[Pubmed]](1)</sup> book. This<sup>[[Microsoft]](3)</sup> not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using something<sup>[[Wikipedia]](2)</sup>. [1]: https://pubmed.com [2]: https://wikipedia.com [3]: https://microsoft.com `; console.log(makeCitationsInline(content));

Markdown, changing the referencing from exported documents to simpler links, stuck in JS

Question

1 answers

solution1
0 2022-07-17 11:33:03

Markdown, changing the referencing from exported documents to simpler links, stuck in JS

Question

1 answers

solution1 0 2022-07-17 11:33:03

solution1
0 2022-07-17 11:33:03