简体   繁体   中英

Regex match markdown link

I have a string with markdown in it. I am trying to strip all markdown using regex but am having trouble with matching links. Here's how far I got:

function stripMarkdown(text) {
  var str = String(text).replace(/(__|\*|\#)/gm, '');
  return str;
}

var testStr = '# This is the title. ## This is the subtitle. **some text** __some more text__. [link here](http://google.com)'

stripMarkdown(testStr);

So I believe the above will strip all unwanted markdown except the link. How do I handle that? Also, if there's a better way to do this, please let me know.

Desired outcome:

This is the title. This is the subtitle. some text some more text. link here

I came up with this regex:

(?:__|[*#])|\[(.*?)\]\(.*?\)

 var str = '# This is the title. ## This is the subtitle. **some text** __some more text__. [link here](http://google.com)' document.write(String(str).replace(/(?:__|[*#])|\\[(.*?)\\]\\(.*?\\)/gm, '$1'));

The accepted answer matches bold tags * and headings ### . Marvin's fix matches weird groups of text if you have more than one bracket pair on a line. (eg [word] a [link](url) )

This regex fixes that:

.replace(/\[([^\[\]]*)\]\((.*?)\)/gm, '$1')

Note that URLs with bracket pairs in them will need to be URL encoded

Thomas's answer above can match headings with ### and bold tags * . To avoid matching those use the following regex instead:

.replace(/([])|\[(.*?)\]\(.*?\)/gm, '$1')

Might be useful for those using javascript/node to match link pattern on markdown.

Try this:

 function stripMarkdown(text) { var str = String(text).replace(/__|\\*|\\#|(?:\\[([^\\]]*)\\]\\([^)]*\\))/gm, '$1'); return str; } var testStr = '# This is the title. ## This is the subtitle. **some text** __some more text__. [link here](http://google.com)' document.write(stripMarkdown(testStr));

It replaces the match with the first capture group, which is the link's text. If the match is something other than the link (markdown) this is empty.

对我来说,这是工作

string.match(/\[[^\]]*\]\([^)]*\)*/)

Markdown is far too complex to do this properly with a simple regular expression. Consider the following examples:

[`[test](test)`](test)
[\[](test) [\]](test)
`[test` [test](test) `test](test)`
``test`[test`` [test](test) ``test`](test)``

In Markdown, characters have a different meaning depending in which context they appear. As you can see, even the syntax highlighting of StackOverflow has trouble to interpret the last line correctly. In addition, Markdown compilers often allow raw HTML in the text.

If you want a simple solution, compile the Markdown and strip out all the HTML elements:

function getMarkdownText(markdown) {
    const compiled = sanitize(marked(markdown));
    const el = document.createElement("div");
    el.innerHTML = compiled;
    return el.innerText;
}

If you want a solution that runs faster but is more complex to implement, hook into a markdown compiler yourself and make it generate the desired output.

This regex will match markdown text following the [some reference text](some url) pattern - and includes two groups containing the values of both the reference text and the url.

\\[([^\\]]+)\\]\\(([^)]+)\\)

If you want, you can just simply replace the markdown text with reference text in the original string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM