简体   繁体   English

将数据从HTML表中获取到数据表中

[英]Getting data from HTML table into a datatable

Ok so I need to query a live website to get data from a table, put this HTML table into a DataTable and then use this data. 好的,我需要查询实时网站以从表中获取数据,将此HTML表放入DataTable然后使用此数据。 I have so far managed to use Html Agility Pack and XPath to get to each row in the table I need but I know there must be a way to parse it into a DataTable. 到目前为止,我已经设法使用Html Agility Pack和XPath来获取我需要的表中的每一行,但我知道必须有一种方法可以将其解析为DataTable。 (C#) The code I am currently using is: (C#)我目前使用的代码是:

string htmlCode = "";
using (WebClient client = new WebClient())
{
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

//My attempt at LINQ to solve the issue (not sure where to go from here)
var myTable = doc.DocumentNode
.Descendants("table")
.Where(t =>t.Attributes["summary"].Value == "Table One")
.FirstOrDefault();

//Finds all the odd rows (which are the ones I actually need but would prefer a
//DataTable containing all the rows!
foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td"))
{
string test = cell.InnerText;
//Have not gone further than this yet!
}

The HTML table on the website I am querying looks like this: 我查询的网站上的HTML表格如下所示:

<table summary="Table One">
<tbody>
<tr class="odd">
<td>Some Text</td>
<td>Some Value</td>
</tr>
<tr class="even">
<td>Some Text1</td>
<td>Some Value1</td>
</tr>
<tr class="odd">
<td>Some Text2</td>
<td>Some Value2</td>
</tr>
<tr class="even">
<td>Some Text3</td>
<td>Some Value3</td>
</tr>
<tr class="odd">
<td>Some Text4</td>
<td>Some Value4</td>
</tr>
</tbody>
</table>

I'm not sure whether it is better/easier to use LINQ + HAP or XPath + HAP to get the desired result, I tried both with limited success as you can probably see. 我不确定是否更好/更容易使用LINQ + HAP或XPath + HAP来获得所需的结果,我尝试了两者,但您可能看到的成功有限。 This is the first time I have ever made a program to query a website or even interact with a website in any way so I am very unsure at the moment! 这是我第一次制作一个查询网站甚至以任何方式与网站互动的程序,所以我现在非常不确定! Thanks for any help in advance :) 在此先感谢您的任何帮助:)

Using some of Jack Eker's code above and some code from Mark Gravell ( see post here ) , I managed to come with a solution. 使用上面的Jack Eker的一些代码和Mark Gravell的一些代码( 见这里的帖子 ),我设法找到了解决方案。 This code snippet is used to obtain the public holidays for the year of 2012 in South Africa as of writing this article 在撰写本文时,此代码段用于获取南非2012年的公共假期

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Web;
using System.Net;
using HtmlAgilityPack;



namespace WindowsFormsApplication
{
    public partial class Form1 : Form
    {
        private DataTable dt;
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {

            string htmlCode = "";
            using (WebClient client = new WebClient())
            {
                client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
                htmlCode = client.DownloadString("http://www.info.gov.za/aboutsa/holidays.htm");
            }
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(htmlCode);

            dt = new DataTable();
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Value", typeof(string));

            int count = 0;


            foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
            {

                foreach (HtmlNode row in table.SelectNodes("tr"))
                {

                    if (table.Id == "table2")
                    {
                        DataRow dr = dt.NewRow();

                        foreach (var cell in row.SelectNodes("td"))
                        {
                            if ((count % 2 == 0))
                            {
                                dr["Name"] = cell.InnerText.Replace("&nbsp;", " ");
                            }
                            else
                            {

                                dr["Value"] = cell.InnerText.Replace("&nbsp;", " ");

                                dt.Rows.Add(dr);
                            }
                            count++;

                        }


                    }

                }


                dataGridView1.DataSource = dt;

            }
        }

    }
}

There's no such method out of the box from the HTML Agility Pack, but it shouldn't be too hard to create one. HTML Agility Pack没有开箱即用的方法,但创建一个方法应该不会太难。 There's samples out there that do XML to Datatable from Linq-to-XML. 那里从Linq到XML的XML到Datatable的样本 These can be re-worked into what you need. 这些可以重新制作成您需要的东西。

If needed I can help out creating the whole method, but not today :). 如果需要,我可以帮助创建整个方法,但不是今天:)。

See also: 也可以看看:

This is my solution. 这是我的解决方案。 May be a bit messy but it is working perfectly at the moment :D 可能有点乱,但它现在正在完美地工作:D

string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

DataTable dt = new DataTable();
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Value", typeof(decimal));

int count = 0;
decimal rowValue = 0;
bool isDecimal = false;
foreach (var row in doc.DocumentNode.SelectNodes("//table[@summary='Table Name']/tbody/tr"))
{
DataRow dr = dt.NewRow();
foreach (var cell in row.SelectNodes("td"))
{
if ((count % 2 == 0))
{
dr["Name"] = cell.InnerText.Replace("&nbsp;", " ");
}
else
{
isDecimal = decimal.TryParse((cell.InnerText.Replace(".", "")).Replace(",", "."), out rowValue);
if (isDecimal)
{
dr["Value"] = rowValue;
}
dt.Rows.Add(dr);
}
count++;
}
}

Simple logic to convert a htmltable to datatable : 将htmltable转换为datatable的简单逻辑:

//Define your webtable
public static HtmlTable table
            {
                get
                {
                    HtmlTable var = new HtmlTable(parent);
                    var.SearchProperties.Add("id", "searchId");
                    return var;
                }
            }

//Convert a webtable to datatable
public static DataTable getTable
            {
                get
                {
                    DataTable dtTable= new DataTable("TableName");
                    UITestControlCollection rows = table.Rows;
                    UITestControlCollection headers = rows[0].GetChildren();
                    foreach (HtmlHeaderCell header in headers)
                    {
                        if (header.InnerText != null)
                            dtTable.Columns.Add(header.InnerText);
                    }
                    for (int i = 1; i < rows.Count; i++)
                    {
                        UITestControlCollection cells = rows[i].GetChildren();
                        string[] data = new string[cells.Count];
                        int counter = 0;
                        foreach (HtmlCell cell in cells)
                        {
                            if (cell.InnerText != null)
                                data[counter] = cell.InnerText;
                            counter++;
                        }
                        dtTable.Rows.Add(data);
                    }
                    return dtTable;
                }
            }

You can try 你可以试试

    DataTable.Rows[i].Cells[j].InnerText;

Where DataTable is the id of your table, i is the row and j is the cells. 其中DataTable是表的id,i是行,j是单元格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM