简体   繁体   English

从文本文件中提取XML标签

[英]Extract XML tag from text file

my intention is to extract single or nested XML tags from a text file. 我的意图是从文本文件中提取单个或嵌套的XML标签。 My input file structure is both plain text and XML (in my case HTML) format. 我的输入文件结构既是纯文本格式,也是XML(在我的情况下为HTML)格式。 What i want to do is to scan input discarding everything until an XML tag is reached; 我想要做的是扫描输入,丢弃所有内容,直到到达XML标签为止。 then extract it all (with everything nested in) and continue this way until the whole file is processed. 然后提取所有内容(嵌套所有内容)并继续这种方式,直到处理完整个文件。 Before attempting doing it on my own, i'd like to see if there is some java library i don't know which could help me. 在尝试自己做之前,我想看看是否有一些我不知道的Java库可以帮助我。

Thank you all. 谢谢你们。

you need to parse the XML file and create a it's relative DOM tree. 您需要解析XML文件并创建它的相对DOM树。 Check out here Java XML-DOM parser tutorial 在这里查看Java XML-DOM解析器教程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM