简体   繁体   English

仅使用课程项目从网页获取特定数据

[英]Getting a specific data from webpage using only class items

I have a source code on a webpage that I wish to extract (I've narrowed it down to exactly what is relevant here: 我有一个要提取的网页上的源代码(我将其范围缩小到此处的确切含义:

    <div class="sideInfoPlayer">
<a class="signLink" href="spieler.php?uid=12345" title="Profile">
    <span class="wrap">Wagamama</span>
</a>

Now the trick here is that I want to get the word Wagamama into a message box but that word changes on every page of that site so I need to get that element but there is no ID on this page. 现在,这里的技巧是,我想将单词Wagamama放入消息框中,但是该单词在该网站的每个页面上都会更改,因此我需要获取该元素,但是此页面上没有ID。 Therefore I was thinking of doing a search for the class named "sideInfoPlayer" first and then find the "wrap" class within the previous class block. 因此,我正在考虑先搜索名为“ sideInfoPlayer”的类,然后在上一个类块中找到“ wrap”类。

I have written the below to get the first one but do not know how to tackle the second one and then get the desired value. 我已经写了下面的文章来获得第一个,但不知道如何解决第二个,然后获得所需的价值。

        HtmlElementCollection col = webBrowser1.Document.GetElementsByTagName("div");
        foreach (HtmlElement element in col)
        {
            string cls = element.GetAttribute("className");
            if (String.IsNullOrEmpty(cls) || !cls.Equals("sideInfoPlayer"))
                continue;
        }

I hope you can help unstuck me on this one. 希望您能帮我解决这个问题。

You have better options. 您有更好的选择。 Look at http://htmlagilitypack.codeplex.com/ 看看http://htmlagilitypack.codeplex.com/

And here: How can i parse html string 在这里: 如何解析html字符串

First you'll need to add reference to HtmlAgilityPack library by downloading it manually or with NuGet package manager. 首先,您需要通过手动下载或使用NuGet软件包管理器添加对HtmlAgilityPack库的引用。

// loading html into HtmlDocument
var doc = new HtmlWeb().Load("http://website.com/mypage");
// walking through all nodes of interest
foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='sideInfoPlayer']/span[@class='wrap']"))
{
  // here is your text: node.InnerText
}

//div[@class='sideInfoPlayer']/span[@class='wrap'] is called Xpath Expression and this one literally means "get me all span elements with class=wrap that are children of div element with class=sideInfoPlayer. //div[@class='sideInfoPlayer']/span[@class='wrap']被称为Xpath表达式 ,这字面意思是“让我将所有class = wrap的span元素归为class = sideInfoPlayer的div元素的子元素。

I didn't test it, but it should work. 我没有测试它,但是应该可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM