简体   繁体   中英

How to remove <a> elements from xPath?

I'm making an applcation in C# with HTMLAgilityPack.

I have the following HTML structure:

<td colspan="3">
    <a href="tournament_detail.asp?EID=3">The North West Junior Champions League 2016</a>
    <br>
    St Bedes Sports Fields,  Manchester. M21 0TT</td>
</td>

I would like to pull out the address, excluding the <a> and the <br />

I have tried the following:

//div[@class='infobox']/table/tr/td[1][not a]

Here is the site I am trying to pull data from

I am using HTMLAgilityPack, so I don't believe I can use the string() function (or atleast I get an exception when trying). Please do not mark this as duplicate, as I am seeking clarification to whether I can use that.

How can I pull back just the address?

Adding predicate [not(a)] would cause the XPath to return only <td> element that doesn't have child <a> , which isn't the wanted outcome. Instead, add /text()[normalize-space()] which will return direct child, non-empty text node from the selected <td> :

var raw = @"<td colspan='3'>
    <a href='tournament_detail.asp?EID=3'>The North West Junior Champions League 2016</a>
    <br>
    St Bedes Sports Fields,  Manchester. M21 0TT</td>";
var doc = new HtmlDocument();
doc.LoadHtml(raw);
var td = doc.DocumentNode.SelectSingleNode("//td/text()[normalize-space()]");
Console.WriteLine(td.InnerText.Trim());

output :

St Bedes Sports Fields,  Manchester. M21 0TT

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM