简体   繁体   English

来自HTML的简单正则表达式

[英]Simple Regex from HTML

I have the following code grabbed from a webpage source code: 我从网页源代码中获取了以下代码:

<span>41,396</span>

And the following regex: 以下正则表达式:

("<span>.*</span>")

Which returns 哪个回报

<span>New Users</span>

However, I don't want to have the tags in the results. 但是,我不希望在结果中包含标签。 I've tried a few things, but Regular Expressions are new to me. 我尝试了一些东西,但正则表达式对我来说是新的。

More so than this I need to get the Regex for the following code: 更重要的是,我需要获取以下代码的正则表达式:

<span>41,396</span>
</span>
<span class="levelColumn">
<span>2,150</span>
</span>
<span class="xpColumn">
<span>161,305,807</span>

I was thinking this may involve line breaks and more, which is why I threw this is separately. 我在想这可能涉及换行等等,这就是为什么我把它分开了。

You could try something like 你可以试试像

<span( class=\".+\")?>(.*)</span>

And then get capture group 2 for the tag's body. 然后获取标签正文的捕获组2。 But be aware that regular expressions are NOT good for parsing HTML/XML. 但请注意,正则表达式不适合解析HTML / XML。 What would happen if you had nested <span> tags? 如果你有嵌套<span>标签会发生什么?

If the input gets even the slightest bit more complicated than what you've shown, look for an HTML parser and try using that instead. 如果输入比您显示的内容更加复杂,请查找HTML解析器并尝试使用它。

You can use capturing group differently to get the value instead of tag + value 您可以使用不同的捕获组来获取值而不是标记+值

"<span>(.*)</span>"

Think to use a HTML parsing library in your language of choice if regex become more complicated. 如果正则表达式变得更复杂,请考虑使用您选择的语言的HTML解析库。

As far as I know regex will lookup line by line, but you could have an expression that would work that out. 据我所知,正则表达式将逐行查找,但你可以有一个表达式可以解决这个问题。

Try: <span>(.*)</span> 尝试: <span>(.*)</span>

You should be able to retrieve the information you want with \\1 您应该能够使用\\1检索所需的信息

In the case of <span class="xpColumn"> it would just not match and \\1 would be empty.. <span class="xpColumn">的情况下,它将不匹配, \\1将为空。

Cheers :) 干杯:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM