简体   繁体   中英

Why does this regex/DOM character entity tester return `undefined`?

var str = 'let us pretend that this is a blog about gardening&cooking; here's an apostrophe & ampersand just for fun.';

This is the string I'm operating on. The desired end result is: "let us pretend that this is a blog about gardening&cooking; here's an apostrophe & ampersand just for fun."

console.log('Before: ' + str);


str = str.replace(/&(?:#x?)?[0-9a-z]+;?/gi, function(m){
  var d = document.createElement('div');
  console.log(m);
  d.innerHTML = m.replace(/&/, '&');
  console.log(d.innerHTML + '|' + d.textContent);
  return !!d.textContent.match(m.replace(/&/, '&')[0]) ? m : d.textContent;
});


console.log('After: ' + str);

The problem is that HTML doesn't support XML's ' To avoid the issue you should use ' instead of '

For more information look at this post:

Why shouldn't ' be used to escape single quotes?

This should do what you want:

str.replace(/&([#x]\d+;|[a-z]+;)/g, "&$1")

or, with a positive lookahead:

str.replace(/&(?=[#x]\d+;|[a-z]+;)/g, "&")

I don't think you need any HTML2text en-/decoding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM