
[英]How do you extract an html snippet from a xml file using javascript?
[英]How to extract HTML Code from a XML File using groovy
我有这个XML文件,我需要从“ mono”元素中提取HTML代码,但是我需要html标签。 我需要使用groovy编程语言。
“ mono”元素中的所有div都是HTML标记,包括div
先感谢您。
<dataset>
<chapters>
<chapter id="700" name="Immunology">
<title>Immunology</title>
<monos>
<mono id="382727">
<div>
<h1>blah blah</h1>
</div>
<div>
<p>blah blah</p>
</div>
</mono>
</monos>
</chapter>
<chapter id="701" name="hematology">
<title>Inmuno Hematology</title>
<monos>
<mono id="blah blah">
<div>
<h1>blah blah</h1>
</div>
<div>
<div class="class1">blah blah</div>
</div>
</mono>
</monos>
</chapter>
</chapters>
</dataset>
我努力了 :
import javax.xml.parsers.*;
xml = new XmlParser().parse("languages.xml")
println("There are " +xml.chapters.chapter.size() +" Chapters")
for (int i = 0; i < xml.chapters.chapter.size(); i++) {
def chapter = xml.chapters.chapter[i]
def chapterName = chapter.'@name'
println chapterName
println("---- Monos List ----\n\n")
for (int j = 0; j < chapter.monos.mono.size(); j++) {
def mono = chapter.monos.mono[j]
println("Mono Content: " + mono.toString());
}
println("---- End Monos List ----\n\n")
}
但是我得到以下输出:
免疫学分为两章---- Monos列表----
单声道内容:mono [attributes = {id = 382727}; 值= [DIV [属性= {}; 值= [H1 [属性= {}; value = [blah blah]]]],div [attributes = {}; 值= [P [属性= {}; 值= [blah blah]]]]]] ----结束Monos列表----
血液学---- Monos列表----
单声道内容:mono [attributes = {id = blah blah}; 值= [DIV [属性= {}; 值= [H1 [属性= {}; value = [blah blah]]]],div [attributes = {}; 值= [DIV [属性= {类= Class1的}; 值= [blah blah]]]]]] ----结束Monos列表----
import groovy.xml.*
def src="""
<dataset>
<chapters>
<chapter id="700" name="Immunology">
<title>Immunology</title>
<monos>
<mono id="382727">
<div>
<h1>blah blah</h1>
</div>
<div>
<p>blah blah</p>
</div>
</mono>
</monos>
</chapter>
<chapter id="701" name="hematology">
<title>Inmuno Hematology</title>
<monos>
<mono id="blah blah">
<div>
<h1>blah blah</h1>
</div>
<div>
<div class="class1">blah blah</div>
</div>
</mono>
</monos>
</chapter>
</chapters>
</dataset>
"""
def parsed=new XmlSlurper().parseText(src)
parsed.'**'.findAll{it.name()=='mono'}.each{mono->
mono.children().each {htmlElement->
println new StreamingMarkupBuilder().bind{out << htmlElement}.toString()
}
}
您可以使用XmlSlurper或XmlParser解析xml内容。
http://groovy.codehaus.org/Reading+XML+using+Groovy的+ XmlSlurper http://groovy.codehaus.org/Reading+XML+using+Groovy的+ XmlParser
import groovy.xml.*
def RECORDS = '''
<dataset>
<chapters>
<chapter id="700" name="Immunology">
<title>Immunology</title>
<monos>
<mono id="382727">
<div>
<h1>blah blah</h1>
</div>
<div>
<p>blah blah</p>
</div>
</mono>
</monos>
</chapter>
<chapter id="701" name="hematology">
<title>Inmuno Hematology</title>
<monos>
<mono id="blah blah">
<div>
<h1>blah blah</h1>
</div>
<div>
<div class="class1">blah blah</div>
</div>
</mono>
</monos>
</chapter>
</chapters>
</dataset>
'''
def records = new XmlSlurper().parseText(RECORDS)
def monos = records.depthFirst().findAll{ it.name().equals('mono') }
assert monos[0].toString() == "blah blahblah blah";
XmlUtil.serialize( monos[0] );
输出:
<?xml version="1.0" encoding="UTF-8"?><mono id="382727">
<div>
<h1>blah blah</h1>
</div>
<div>
<p>blah blah</p>
</div>
</mono>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.