简体   繁体   English

C#正则表达式无法匹配任何东西(可能是因为无法正确转义字符)

[英]C# Regex Can't Match Anything (Probably because can't escape characters properly)

I make a regex pattern and tested in this site : http://rubular.com/ 我制作了一个正则表达式模式并在此站点中进行了测试: http//rubular.com/

I'm writing this pattern exactly like this to the first box in that site. 我正在将这个模式写成该网站的第一个框。

<div class="product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/>

I left the second box empty. 我把第二个盒子留空了。

My regex pattern working perfectly fine respect to this site. 我的正则表达式模式非常好地尊重这个网站。

But i can't get it working in C# 但我不能让它在C#中工作

I'm trying this: 我正在尝试这个:

WebClient client = new WebClient();

string MainPage = client.DownloadString("http://www.vatanbilgisayar.com/cep-telefonu-modelleri/");

string ItemPattern = "<div class=\"product clearfix\">\\n+" +   //  <div class="product clearfix">\n
                "<div class=\"img\">\\n" +                  //  <div class="img">\n
                "+<a href=\"(.*?)\">\\n" +                  //  +<a href="(.*?)">\n
                "+<img class=\"lazyload\"" +                //  +<img class="lazyload"
                "id='.*' data-original=\"(.*?)\"" +         //  id='.*' data-original="(.*?)"
                "alt=\".*\" title=\"(.*?)\"\\/>";           //  alt=".*" title="(.*?)" \/>

MatchCollection matches = Regex.Matches(MainPage, ItemPattern);

foreach (Match match in matches)
{
    Console.WriteLine("Area Code:        {0}", match.Groups[1].Value);
    Console.WriteLine("Telephone number: {0}", match.Groups[2].Value);
    Console.WriteLine();
}

I simply escaped every " with \\ . I really don't understand why it's not working and this starting to drive me crazy.. 我只是逃脱了每一个“用\\。我真的不明白为什么它不起作用,这开始让我发疯...

You need 2 layers of escape sequences. 您需要2层转义序列。 You need to escape once for c# and once more for the regex syntax. 你需要为c#转义一次,再为regex语法转义一次。

If you want to escape characters for regex have to escape \\ too, so you should change your \\ to \\\\ for escape sequences at the regex level 如果要转义正则表达式的字符也必须转义\\ ,所以你应该在正则表达式级别将转换序列的\\更改为\\\\

use TWO \\'s for every single \\ in your string. 对你的字符串中的每个\\使用两个\\。 Not counting the escaping you already did for the quotes. 不计算你已经为报价做的逃避。 Since \\ is an escape character. 因为\\是一个转义字符。 It looks like mainly with "\\n" occurring 3 times. 看起来主要是“\\ n”发生3次。

Original String: 原始字符串:

"product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/

Also, you can break that up into more than one line. 此外,您可以将其分解为多行。 c# ignores spaces, so just close the quote and add a "+" to the end of the line, continue by starting with another quote. c#忽略空格,所以只需关闭引号并在行尾添加“+”,继续以另一个引号开头。

C# String: C#字符串:

string ItemPattern = "<div class=\"product clearfix\">\\n" +   //  <div class="product clearfix">\n
                    "+<div class=\"img\">\\n" +                 //  +<div class="img">\n
                    "+<a href=\"(.*?)\">\\n" +                  //  +<a href="(.*?)">\n
                    "+<img class=\"lazyload\"" +                //  +<img class="lazyload"
                    "id='.*' data-original=\"(.*?)\"" +         //  id='.*' data-original="(.*?)"
                    "alt=\".*\" title=\"(.*?)\"\\/>";           //  alt=".*" title="(.*?)" \/>

If you still have a problem with it, there is something else wrong, probably in the RegEx.Match(mainPage, ItemPattern). 如果你仍然有问题,可能还有其他错误,可能在RegEx.Match(mainPage,ItemPattern)中。 According to the debugging you did, it sounds like the string is successfully being created, and there is no MatchCollection. 根据您所做的调试,听起来字符串已成功创建,并且没有MatchCollection。 So it's either in how you are obtaining the matches, or in referencing them. 所以它要么是你如何获得比赛,要么是参考比赛。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM