使用C＃中的正则表达式从html字符串中提取句子

Question

I have a control that returns a datatable which consists of html code as a string in each row. 我有一个控件，该控件返回一个数据表，该数据表由html代码作为每一行的字符串组成。 I'm trying to use Regex to extract only the words enclosed within the HTML tags 我正在尝试使用正则表达式仅提取HTML标记中包含的单词

{[h]</span></p><p class="MsoNormal" style="text-align: left;"><span style="color: #ff6600; font-weight: bold;"><span style="font-family: arial, helvetica, sans-serif;">What do they mean today?</span></span></p><p style="text-align: left; margin: 0px;"><span style="font-family: arial, helvetica, sans-serif;">[/h]}

I want to extract only the sentence What do they mean today? 我只想提取一句话， 今天它们是什么意思？ or any sentence that consists of more than 1 word. 或任何包含超过1个单词的句子。

I tried (/w*/s?)* but seems to only look at the beginning of the string and not throughout the whole string. 我试过（/ w * / s？）*，但似乎只看字符串的开头，而不看整个字符串。 I'm not very good with regular expressions. 我对正则表达式不太满意。 Any help will be much appreciated. 任何帮助都感激不尽。

Answer 1

You could use the below regex to grab the string you want. 您可以使用下面的正则表达式来获取所需的字符串。

@"(?<=>)[^<>]+(?=<)"

But regex is not the recommended way to parse html files. 但是，不建议使用正则表达式来解析html文件。

DEMO DEMO

使用C＃中的正则表达式从html字符串中提取句子

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-02-10 07:48:13

使用C＃中的正则表达式从html字符串中提取句子

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-02-10 07:48:13

解决方案1
0 已采纳 2015-02-10 07:48:13