简体   繁体   English

如何从Java网页中获取信息?

[英]How to get information from a webpage in Java?

Does anyone know of a quick way that I can get information from a webpage in Java? 有人知道我可以从Java网页中获取信息的快速方法吗? For instance, if I'm looking at a page like this: http://www.ncbi.nlm.nih.gov/pubmed/?term=10952317 and i want to extract the list of words beneath the heading "MeSH Terms", how would I go about doing so? 例如,如果我正在查看这样的页面: http ://www.ncbi.nlm.nih.gov/pubmed/?term=10952317,并且我想提取“ MeSH条款”标题下的单词列表,我该怎么做?

I have something that can read the source but it is full of HTML tags and such... 我有一些可以读取源代码的东西,但其中充满了HTML标记等。

Any help is much appreciated! 任何帮助深表感谢!

As has been mentioned on here countless times before have a look at JSoup , which is a HTML parsing library for Java. 正如这里已经提到的无数次介绍JSoup一样 ,它是Java的HTML解析库。 Or write your own (not recommended). 或自己写(不推荐)。

TagSoup可能适合您。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM