仅使用课程项目从网页获取特定数据

Question

I have a source code on a webpage that I wish to extract (I've narrowed it down to exactly what is relevant here: 我有一个要提取的网页上的源代码（我将其范围缩小到此处的确切含义：

    <div class="sideInfoPlayer">
<a class="signLink" href="spieler.php?uid=12345" title="Profile">
    <span class="wrap">Wagamama</span>
</a>

Now the trick here is that I want to get the word Wagamama into a message box but that word changes on every page of that site so I need to get that element but there is no ID on this page. 现在，这里的技巧是，我想将单词Wagamama放入消息框中，但是该单词在该网站的每个页面上都会更改，因此我需要获取该元素，但是此页面上没有ID。 Therefore I was thinking of doing a search for the class named "sideInfoPlayer" first and then find the "wrap" class within the previous class block. 因此，我正在考虑先搜索名为“ sideInfoPlayer”的类，然后在上一个类块中找到“ wrap”类。

I have written the below to get the first one but do not know how to tackle the second one and then get the desired value. 我已经写了下面的文章来获得第一个，但不知道如何解决第二个，然后获得所需的价值。

        HtmlElementCollection col = webBrowser1.Document.GetElementsByTagName("div");
        foreach (HtmlElement element in col)
        {
            string cls = element.GetAttribute("className");
            if (String.IsNullOrEmpty(cls) || !cls.Equals("sideInfoPlayer"))
                continue;
        }

I hope you can help unstuck me on this one. 希望您能帮我解决这个问题。

Answer 1

You have better options. 您有更好的选择。 Look at http://htmlagilitypack.codeplex.com/ 看看http://htmlagilitypack.codeplex.com/

And here: How can i parse html string 在这里：如何解析html字符串

First you'll need to add reference to HtmlAgilityPack library by downloading it manually or with NuGet package manager. 首先，您需要通过手动下载或使用NuGet软件包管理器添加对HtmlAgilityPack库的引用。

// loading html into HtmlDocument
var doc = new HtmlWeb().Load("http://website.com/mypage");
// walking through all nodes of interest
foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='sideInfoPlayer']/span[@class='wrap']"))
{
  // here is your text: node.InnerText
}

//div[@class='sideInfoPlayer']/span[@class='wrap'] is called Xpath Expression and this one literally means "get me all span elements with class=wrap that are children of div element with class=sideInfoPlayer. //div[@class='sideInfoPlayer']/span[@class='wrap']被称为Xpath表达式，这字面意思是“让我将所有class = wrap的span元素归为class = sideInfoPlayer的div元素的子元素。

I didn't test it, but it should work. 我没有测试它，但是应该可以。

仅使用课程项目从网页获取特定数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2012-11-21 13:48:26

仅使用课程项目从网页获取特定数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2012-11-21 13:48:26

解决方案1
0 已采纳 2012-11-21 13:48:26