简体   繁体   English

从html解析的正则表达式,如何获取特定的字符串?

[英]Regex from a html parsing, how do I grab a specific string?

I'm trying to specifically get the string after charactername= and before " >. How would I use regex to allow me to catch only the player name? 我正在尝试专门获取charactername =之后和“>之间的字符串。如何使用正则表达式让我仅捕获播放器名称?

This is what I have so far, and it's not working. 到目前为止,这是我所拥有的,并且无法正常工作。 Not working as it doesn't actually print anything. 无法工作,因为它实际上无法打印任何内容。 On the client.DownloadString it returns a string like this: 在client.DownloadString上,它返回如下字符串:

<a href="https://my.examplegame.com/charactername=Atro+Roter" >

So, I know it actually gets string, I'm just stuck on the regex. 因此,我知道它实际上是字符串,只是卡在正则表达式上。

using (var client = new WebClient())
        {

            //Example of what the string looks like on Console when I Console.WriteLine(html)
            //<a href="https://my.examplegame.com/charactername=Atro+Roter" >

            // I want the "Atro+Roter"

            string html = client.DownloadString(worldDest + world + inOrderName);
            string playerName = "https://my.examplegame.com/charactername=(.+?)\" >";

            MatchCollection m1 = Regex.Matches(html, playerName);


            foreach (Match m in m1)
            {
                Console.WriteLine(m.Groups[1].Value);
            }
        }

I'm trying to specifically get the string after charactername= and before " >. 我正在尝试专门在charactername =之后和“>”之前获取字符串。

So, you just need a lookbehind with lookahead and use LINQ to get all the match values into a list: 因此,您只需要先行查找即可,然后使用LINQ将所有匹配值放入列表中:

var input = "your input string";
var rx = new Regex(@"(?<=charactername=)[^""]+(?="")";
var res = rx.Matches(input).Cast<Match>().Select(p => p.Value).ToList();

The res variable should hold all your character names now. res变量现在应该包含所有字符名称。

I assume your issue is trying to parse the URL. 我认为您的问题是试图解析URL。 Don't - use what .NET gives you: 不要使用.NET给您的东西:

var playerName = "https://my.examplegame.com/?charactername=NAME_HERE";
var uri = new Uri(playerName);
var queryString = HttpUtility.ParseQueryString(uri.Query);

Console.WriteLine("Name is: " + queryString["charactername"]);

This is much easier to read and no doubt more performant. 这更容易阅读,并且无疑具有更高的性能。

Working sample here: https://dotnetfiddle.net/iJlBKW 此处的工作示例: https : //dotnetfiddle.net/iJlBKW

All forward slashes must be unescaped with back slashes like this \\/ 所有正斜杠必须不与反斜杠这样\\ /

string input = @"<a href=""https://my.examplegame.com/charactername=Atro+Roter"" >";
 string playerName = @"https:\/\/my.examplegame.com\/charactername=(.+?)""";

 Match match = Regex.Match(input, playerName);
 string result = match.Groups[1].Value;

Result = Atro+Roter 结果= Atro + Roter

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM