简体   繁体   English

怎么去掉空的<p>使用 JavaScript 或 Cheerio 的字符串中的标签?

[英]How do I remove empty <p> tags from a string using JavaScript or Cheerio?

I have some HTML as a string我有一些 HTML 作为字符串

"<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now. </p><p>​</p><p>Let's write another paragraph, and see how it renders when I read this post later. </p><p>​</p><p>This is another short paragraph</p>"

How do I strip the empty p tags from this string using Cheerio or JS.如何使用 Cheerio 或 JS 从这个字符串中去除空的 p 标签。

I've tried searching on Stack Overflow and Google in general without a clear working solution.我已经尝试在 Stack Overflow 和 Google 上进行搜索,但没有明确的工作解决方案。

EDIT: Apologies, I have just noticed that my string has quite a lot of white space between the tags:编辑:抱歉,我刚刚注意到我的字符串在标签之间有很多空白:

Here's an example one that comes up when I use console.log in my app:这是我在应用程序中使用 console.log 时出现的示例:

<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now. </p>
<p>​</p>
<p>Let's write another paragraph, and see how it renders when I read this post later. </p>
<p>​</p>
<p>Let's write another paragraph, and see how it renders when I read this post later. </p>

You can use .replace("<p></p>", "") if the tags don't have any attributes but if they do there is another way (aside from using regex to catch and replace tags).如果标签没有任何属性.replace("<p></p>", "")您可以使用.replace("<p></p>", "")但如果它们有,还有另一种方法(除了使用正则表达式来捕获和替换标签)。

A good way of doing things would be using native DOM functions.一个很好的处理方式是使用原生 DOM 函数。

To remove empty tag it is possible to use the following selector.要删除空标签,可以使用以下选择器。

document.querySelectorAll("*:empty").forEach((x)=>{x.remove()});

In Your case maybe something like this在你的情况下可能是这样的

var div = document.createElement("div");
div.innerHTML = "<p>hello there</p><p class='empty'></p><p>Not empty</p><p></p>"//your variable containing HTML here;
div.querySelectorAll("*:empty").forEach((x)=>{x.remove()})
// Output: div.innerHTML == <p>hello there</p><p>Not empty</p>
//Then use remaining innerHTML as you wish

But note that :empty will not work with whitespace like this <p> </p> Also note that :empty will remove self closing tags但请注意:empty不会像这样使用空格<p> </p>还要注意:empty将删除自关闭标签

In your code in the empty <p> tags you have \​ (Zero width space) characters.在空<p>标签中的代码中,您有 \​ (零宽度空间)字符。 This character stay invisible but is there这个角色保持隐形,但在那里

You can use split() and join('') methods您可以使用split()join('')方法

 var test = "<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now. </p><p>​</p><p>Let's write another paragraph, and see how it renders when I read this post later. </p><p>​</p><p>This is another short paragraph</p>"; var str = test.split('<p>​</p>').join(''); console.log(str);

Or you can use replace() method或者你可以使用replace()方法

 var test = "<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now. </p><p>​</p><p>Let's write another paragraph, and see how it renders when I read this post later. </p><p>​</p><p>This is another short paragraph</p>"; var str = test.replace(/<p>​<\\/p>/gi, ''); console.log(str);

You can just replace string "<p></p>" to empty string ""您可以将字符串"<p></p>"替换为空字符串""

 var str = "<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now. </p><p></p><p>Let's write another paragraph, and see how it renders when I read this post later. </p><p></p><p>This is another short paragraph</p>"; str = str.replace(/<p>\\s*<\\/p>/ig, ''); str = str.replace(/<p\\s*\\/>/ig, ''); console.log(str);

You can try this:你可以试试这个:

 let str = "<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now. </p><p></p><p>Let's write another paragraph, and see how it renders when I read this post later. </p><p></p><p>This is another short paragraph</p>"; // If your <p> element has attribtues then also it will be replaced. str = str.replace(/<p(\\s+[a-z0-9\\-_\\'\\"=]+)*><\\/p>/ig, ''); console.log(str);
 .as-console-wrapper {min-height: 100%!important; top: 0;}

You could use the replace method:您可以使用replace方法:

str = "<p>This is some HTML code</p>";
stripped = str.replace("<p>", "").replace("<\/p>", "");

console.log(stripped);
const regex = /<[^>]*>\s*<\/[^>]*>/;
const str = `<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now</p><p></p><p>Let's write another paragraph, and see how it renders when I read this post later. </p><p></p><p>This is another short paragraph</p>`;
let m;

 if ((m = regex.exec(str)) !== null) {
   // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
       console.log(`Found match, group ${groupIndex}: ${match}`);
      });

Try This试试这个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM