简体   繁体   English

如何在C#中使用正则表达式删除<和>之间的字符?

[英]How I can remove characters between < and > using regex in c#?

I have a string str="<u>rag</u>" . 我有一个字符串str="<u>rag</u>" Now, i want to get the string "rag" only. 现在,我只想获取字符串"rag" How can I get it using regex? 如何使用正则表达式获取它?

My code is here.. 我的代码在这里。

I got the output="" 我得到了输出=“”

Thanks in advance.. 提前致谢..

C# code: C#代码:

string input="<u>ragu</u>";
string regex = "(\\<.*\\>)";
string output = Regex.Replace(input, regex, "");
const string HTML_TAG_PATTERN = "<.*?>";
Regex.Replace (str, HTML_TAG_PATTERN, string.Empty);

Using regex for parsing html is not recommended 不建议使用regex解析html

regex is used for regularly occurring patterns. regex用于定期出现的模式。 html is not regular with it's format(except xhtml ).For example html files are valid even if you don't have a closing tag !This could break your code. html的格式不规则( xhtml除外)。例如,即使您没有 closing tag html文件也有效!这可能会破坏代码。

Use an html parser like htmlagilitypack 使用像htmlagilitypack这样的html解析器


WARNING {Don't try this in your code} 警告 {请勿在您的代码中尝试此操作}

To solve your regex problem! 解决您的正则表达式问题!

<.*> replaces < followed by 0 to many characters(ie u>rag</u ) till last > <.*>替换<后跟0到多个字符(即u>rag</u ),直到最后一个 >

You should replace it with this regex 您应该用此正则表达式替换它

<.*?>

.* is greedy ie it would eat as many characters as it matches .*贪婪的,即它将吃掉匹配的所有字符

.*? is lazy ie it would eat as less characters as possible 懒惰的,即它将吃掉尽可能少的字符

Sure you can: 你当然可以:

   string input = "<u>ragu</u>";
    string regex = "(\\<[/]?[a-z]\\>)";
    string output = Regex.Replace(input, regex, "");

You don't need to use regex for that. 您不需要为此使用正则表达式。

string input = "<u>rag</u>".Replace("<u>", "").Replace("</u>", "");
Console.WriteLine(input);

Your code was almost correct, a small modification makes it work: 您的代码几乎是正确的,只需进行少量修改即可使其工作:

 string input = "<u>ragu</u>";
 string regex = @"<.*?\>";
 string output = Regex.Replace(input, regex, string.empty);

Output is 'ragu'. 输出为“ ragu”。

EDIT : this solution may not be the best. 编辑 :此解决方案可能不是最好的。 Interesting remark from user the-land-of-devils-srilanka: do not use regex to parse HTML. 用户the-land-of-devils-srilanka的有趣话:不要使用正则表达式来解析HTML。 Indeed, see also RegEx match open tags except XHTML self-contained tags . 实际上, 除了XHTML自包含标签之外 ,还请参见RegEx匹配开放标签

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM