简体   繁体   English

如何获取 html 标签内的内容,包括在 javascript 中使用正则表达式的标签?

[英]How to get content inside html tags including the tags using regex in javascript?

I have text below -我在下面有文字 -

how  much  production  in  batu

Now this text appears as a series of html tags.现在此文本显示为一系列html标签。 Basically each word is wrapped in a span with a specific style or class .基本上,每个单词都包含在具有特定styleclassspan中。 Here is how it looks like这是它的样子

'<span style="">how &nbsp;</span><span style="">much &nbsp;</span><span class="pink-highlight">production &nbsp;</span><span style="">in &nbsp;</span><span class="yellow-highlight">batu</span>'

Now I want two things from this html string: style or class and content inside the span (without &nbsp; )现在我想从这个html字符串中得到两件事:样式或 class 和跨度内的内容(没有&nbsp;

So I would want an array of following information from the string所以我想要一个字符串中的以下信息array

[["", "how"], ["", "much"], ["pink-highlight", "production"], ["", "in"], ["yellow-highlight", "batu"]]

Now this can be easily done using regex .现在这可以使用regex轻松完成。 But I am not well versed with regex .但我并不精通regex The pattern that I could think of我能想到的模式

<span>(.*?)</span>

But it will only find out the content inside span and it won't even work in this case since each span has a style tag or a class.但它只会找出span内的内容,甚至在这种情况下都不起作用,因为每个span都有一个style标签或一个 class。

So what regex would best apply in this case to get the desired result?那么在这种情况下,什么regex最适合获得所需的结果呢?

Using a reg exp can fail with matching HTML.使用 reg exp 可能会因匹配 HTML 而失败。 It is pretty easy to just parse it as HTML and get the data.将其解析为 HTML 并获取数据非常容易。

 var html = '<span style="">how &nbsp;</span><span style="">much &nbsp;</span><span class="pink-highlight">production &nbsp;</span><span style="">in &nbsp;</span><span class="yellow-highlight">batu</span>' var temp = document.createElement('div') temp.innerHTML = html var data = Array.from(temp.querySelectorAll('span')).map(span => ([ span.getAttribute("style") || span.getAttribute("class") || '', span.textContent.trim() ]) ) console.log(data)

I will provide a simple regex.我将提供一个简单的正则表达式。 Actually I just added 2 more options.实际上,我只是添加了另外 2 个选项。

(<span>(.*?)<\/span>)|(<span style=".*?">(.*?)<\/span>)|(<span class=".*?">(.*?)<\/span>)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM