简体   繁体   English

使用Htmlagilitypack和Linq-XML提取整个表

[英]Extract entire table using Htmlagilitypack and Linq-XML

I've found a code snippet from a post on here a while back. 我前不久在这里的帖子中找到了一个代码片段。 As I'm a beginner with C# I'm kind of lost. 因为我是C#的初学者,所以有点迷失了。

I'm trying extract all cells from a table and write them to an XML file that looks like this 我正在尝试从表中提取所有单元格并将它们写入到一个看起来像这样的XML文件中

<?xml version="1.0" encoding="utf-8"?>
<Stats Date="11/4/2013">
  <Player Rank="1">
    <Name>P.K. Subban</Name>
    <Team>MTL</Team>
    <Pos>D</Pos>
    <GP>15</GP>
    <G>3</G>
    <A>11</A>
    <Pts>14</Pts>
    <PlusMinus>+2</PlusMinus>
    <PIM>16</PIM>
    <PP>2</PP>
    <SH>0</SH>
    <GW>0</GW>
    <OT>0</OT>
    <Shots>47</Shots>
    <ShotPctg>6.4</ShotPctg>
    <TOIPerGame>24:29</TOIPerGame>
    <ShiftsPerGame>27.3</ShiftsPerGame>
    <FOWinPctg>0.0</FOWinPctg>
  </Player>
</Stats>

My issue is I don't know how to loop through the entire table which is 25 rows and 19 columns. 我的问题是我不知道如何遍历25行19列的整个表。 I'm only able to extract 1 row out of the whole table. 我只能从整个表格中提取1行。

This is what I have (I've taking the snippet and modified the elementNames and Xpath 这就是我所拥有的(我已摘录并修改了elementNames和Xpath

public void ParseHtml()
        {
            var htmlDoc = new HtmlDocument();
            htmlDoc.LoadHtml(Source);


            var cells = htmlDoc.DocumentNode
                                                   .SelectNodes("//table[@class='data stats']/tbody/tr/td")
                                                   .Select(node => node.InnerText.Trim())
                                                   .ToList();

            var elementNames = new[] { "Name", "Team", "Pos", "GP", "G", "A", "Pts", "PlusMinus", "PIM", "PP", "SH", "GW", "OT", "Shots", "ShotPctg", "TOIPerGame", "ShiftsPerGame", "FOWinPctg" };
            var xmlDoc = new XElement("Stats", new XAttribute("Date", DateTime.Now.ToShortDateString()),
                    new XElement("Player", new XAttribute("Rank", cells.First()),
                        cells.Skip(1)
                             .Zip(elementNames, (Value, Name) => new XElement(Name, Value))
                             .Where(element => !String.IsNullOrEmpty(element.Value))
                    )
                );
            xmlDoc.Save("parsed.xml");
        }

Things I've tried: changing 我尝试过的事情:改变

var cells = htmlDoc.DocumentNode
.SelectNodes("//table[@class='data stats']/tbody/tr/td")
.Select(node => node.InnerText.Trim())
.ToList();

To

foreach (HtmlNode cells in htmlDoc.DocumentNode
    .SelectNodes("//table[@class='data stats']/tbody/tr/td")
    .Select(node => node.InnerText.Trim())
    .ToList() )
{
var elementNames....
..
...

With this change I get no values and the xml nodes are reduced to 2. Can anyone help me out? 进行此更改后,我没有任何值,并且xml节点减少到2个。有人可以帮助我吗? I've been trying for 3 days to solve this. 我已经尝试了3天来解决这个问题。

Edit: HTML source file: http://www.nhl.com/ice/playerstats.htm?season=20132014&gameType=2&team=BUF&position=S&country=&status=&viewName=summary 编辑:HTML源文件: http ://www.nhl.com/ice/playerstats.htm?season = 20132014& gameType = 2& team = BUF&position = S&country =&status =& viewName = summary

Try this: 尝试这个:

// ...
var xmlDoc = new XElement("Stats",
    new XAttribute("Date", DateTime.Now.ToShortDateString()));
XElement iteratingElement = null;
var length = elementNames.Length + 1;
for (int i = 0; i < cells.Count; i++)
{
    if (i % ((i == 0) ? 1 : length) == 0)
    {
        iteratingElement = new XElement("Player",
            new XAttribute("Rank", cells[i]));
        xmlDoc.Add(iteratingElement);
    }
    else
    {
        iteratingElement
            .Add(new XElement(elementNames[(i % length) - 1], cells[i]));
    }
}
xmlDoc.Save("parsed.xml");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM