简体   繁体   English

拆分字符串,忽略html标签

[英]Split string ignoring html tags

Is it possible to split a string by space " 是否可以用空格分割字符串? " and to ignore the html tags in it ? 并忽略其中的html标签吗?
The html tags may have style elements like : style="font-size:14px; color: rgb(0, 0, 0)" ..... html标签可能具有如下样式元素:style =“ font-size:14px; color:rgb(0,0,0)” .....

The string i'm talking about is: 我在说的字符串是:

<div class="line"><span style="color: rgb(0,0,0)">John</span><u> has</u><b> apples</b></div>

If you can see i have space character inside the u tag and inside the b tag 如果您可以看到我在u标签和b标签内都有空格字符

What i am trying to get is the text to split as following 我想要得到的是要拆分的文本,如下

<div class="line"><span style="color: rgb(0,0,0)">John</span><u>

has</u><b>

apples</b></div>

I have the following regex but it does not give me the rest of the string, just the first 2 parts : 我有以下正则表达式,但它没有给我剩下的字符串,只有前两个部分:

[\<].+?[\>]\s

Split using the following regexp: 使用以下正则表达式拆分:

str.split(/ (?=[^>]*(?:<|$))/)

[
  "<div class="line"><span style="color: rgb(0,0,0)">John</span><u>", 
  "has</u><b>", 
  "apples</b></div>"
]

The ?= is a look-ahead . ?=超前的 It says, "find spaces which are followed by some sequence of characters that are NOT greater-than signs, then a less-than sign (or end of string). 它说:“查找空格,后面跟一些不是大于号的字符序列,然后是小于号(或字符串的结尾)。

The ?: is a non-capturing group . ?: 是非捕获组 We need that here, because split has a special behavior: the presence of a capturing group tells it to include the splitters in the resulting array of pieces, which we don't want. 我们在这里需要这样做,因为split具有特殊的行为:捕获组的存在会告诉它将splitters包含在结果数组中,这是我们不想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM