A continuation of this post , I am trying to parse out some data from an HTML page. Here is the HTML (there is more info on the page, but this is the important section):
<table class="integrationteamstats">
<tbody>
<tr>
<td class="right">
<span class="mediumtextBlack">Queue:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
<td class="right">
<span class="mediumtextBlack">Aban:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0%</span>
</td>
<td class="right">
<span class="mediumtextBlack">Staffed:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
</tr>
<tr>
<td class="right">
<span class="mediumtextBlack">Wait:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0:00</span>
</td>
<td class="right">
<span class="mediumtextBlack">Total:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
<td class="right">
<span class="mediumtextBlack">On ACD:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
</tr>
</tbody>
</table>
I need to get 2 pieces of information: the data inside of the td below Queue and the data inside the td below Wait (so the Queue count and wait time). Obviously the numbers are going to update frequently.
I have gotten to the point where the HTML is pilled into an HtmlDocument variable. And I've found something along the lines of using an HtmlNodeCollection to gather nodes that meet a certain criteria. This is basically where I am stuck:
HtmlNodeCollection tds =
new HtmlNodeCollection(this.html.DocumentNode.ParentNode);
tds = this.html.DocumentNode.SelectNodes("//td");
foreach (HtmlNode td in tds)
{
/* I want to write:
* If the last node's value was 'Queue', give me the value of this node.
* and
* If the last node's value was 'Wait Time', give me the value of this node.
*/
}
And I can go through this with a foreach
, but I am not certain how to access the value or how to get the next value.
Generally, there's no need to go through with a foreach
as getting the targeted information is pretty easy (with a foreach
you'd have to manage the state of each iteration of the loop and it's really unwieldy).
First, you want to get the table. Filtering on the class
attribute is generally a bad idea, as you can have multiple elements in an HTML document that have the class applied to it. If you had an id
attribute, that would be ideal.
That said, if this is the only table with this class, then you can get the body of the table
element using:
// Get the table.
HtmlNode tableBody = document.DocumentNode.SelectSingleNode(
"//table[@class='integrationteamstats']/tbody");
From there, you want to get the individual rows. Since these are direct children of the tbody
element, you can get the rows by position through the ChildNodes
property, like so:
HtmlNode queueRow = tableBody.ChildNodes[0];
HtmlNode waitRow = tableBody.ChildNodes[1];
Then you want the second td
element in each row. While there's a span
tag in there that wraps the content, you want all of the text that's in the td
element in it's entirety, you can use the InnerText
property to get the value:
string queueValue = queueRow.ChildNodes[1].InnerText;
string waitValue = waitRow.ChildNodes[1].InnerText;
Note, there's replication here, so if you find there are a lot of rows that you have to parse like this, you might want to factor out some of the logic into helper methods.
You could also use CsQuery to do this. Since it uses familiar CSS selector syntax & jQuery methods, it can be easier to use than HAP for more complex DOM navigation. For example:
// function to get the text from the cell AFTER the one containing 'text'
string getNextCellText(CQ dom, string text) {
// find the target cell
CQ target= dom.Select(".integrationteamstats td:contains(" + text + ")");
// return the text contents of the next cell
return target.Next().Text();
}
void Main() {
var dom = CQ.Create(html);
string queue = getNextCellText(dom,"Queue");
string wait = getNextCellText(dom,"Wait:");
.. do stuff
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.