I want to download a html source, then search for the username and other information, and then display this in my program. I'm pretty new to programming, but a straight noob when it comes to things like this (Regex) so I hope you can explain it to me.
I used Regex before extracting a K/D ratio from a html source, for that I used this code:
string pattern = @"<span class=""kdratio"">\d+\.\d+";
But I have no idea how to start on this one...
This is the line of the source that contains the information:
<section class="profile-header" profile="true" motto="user's motto" user="User" figure="hr-3322-45.hd-190-1.ch-3342-64-66.lg-285-64.sh-3068-82-66.ea-1404-64">
I only need the parts user="User"
and figure="x"
, I couldn't try anything because I really wouldn't know how to start, because the html line looks so different from what I have experience with.
Regular expressions are not a good idea for matching HTML unless it's very simple, single, tag matching. See here: RegEx match open tags except XHTML self-contained tags
I recommend using an HTML DOM-parsing library and use XPath or CSS selectors to get the information you want. For .NET, HtmlAgilityPack is recommended. For CSS Selectors you'll want Fizzler (an add-on for HtmlAgilityPack).
In JavaScript (easily rewritten to C# and HtmlAgilityPack) it would be this:
document.querySelector(
"section[class=profile-header][profile=true][user=User]"
).textContent
Generally for parsing HTML, Regex is not a good choice! HTML tends to be so complicated and it is so hard to write a single Regex to be able to match everything! Instead use a parser like Html Agility Pack .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.