[英]Trying to get attribute value with Apache Tika and XPath
I have tried many different XPath values and just don't understand why I can't retrieve what I want with Apache Tika.我尝试了许多不同的 XPath 值,只是不明白为什么我无法用 Apache Tika 检索我想要的东西。 I want to retrieve the href attribute value of links on random webpages.
我想检索随机网页上链接的 href 属性值。 I managed to find out how to extract the content inside the tags but trying to get the attribute values always returns empty.
我设法找出如何提取标签内的内容,但试图获取属性值总是返回空。 What am I doing wrong?, Here is my code below, Thanks a lot
我在做什么错?,下面是我的代码,非常感谢
XPathParser xhtmlParser = new XPathParser ("xhtml", XHTMLContentHandler.XHTML);
Matcher anchorLinkContentMatcher = xhtmlParser.parse("//xhtml:a/@xhtml:href/text()");
ContentHandler handler = new MatchingContentHandler(
new ToHTMLContentHandler(), anchorLinkContentMatcher);
HtmlParser parser = new HtmlParser();
ParseContext pcontext = new ParseContext();
try {
parser.parse(urlContentStream, handler, new Metadata(),pcontext);
System.out.println(handler);
}
catch (Exception e)
{....}
I have tried these different XPaths:我尝试过这些不同的 XPath:
//xhtml:a/@xhtml:href
//xhtml:a/@href/text()
//xhtml:a/@href
//@xhtml:href/text()
You were almost there... you will need:你快到了……你需要:
//xhtml:a/@href
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.