简体   繁体   English

Jsoup获取具有CDATA标签的javascript内容?

[英]Jsoup get contents of javascript that has CDATA tags?

I am using Jsoup to parse a webpage. 我正在使用Jsoup来解析网页。 But some if the info that I want to parse is inside a CDATA tag that prevents the parser from extracting the data inside. 但是有些我希望解析的信息是在CDATA标记内部,这会阻止解析器提取内部数据。 How would I go about extracting data from within a CDATA tag? 我如何从CDATA标签中提取数据? EXAMPLE: 例:

<script type='text/javascript'><!--// <![CDATA[
    OA_show('300x250');
// ]]> --></script>
         <script type='text/javascript'>alert("Hello");</script>

If i use Jsoup to parse this page and try selecting all tha matching elements in the page with "script[type=text/javascript]" I get returned the contents of other scripts in the page that do not have CDATA tags but not the Alert("Hello"); 如果我使用Jsoup解析此页面并尝试使用“script [type = text / javascript]”选择页面中所有匹配的元素,我会返回页面中没有CDATA标签而不是警报的其他脚本的内容(“你好”); value. 值。 How would I go about getting that a value inside a CDATA tag with Jsoup? 我如何使用Jsoup在CDATA标签中获取该值?

Thanks! 谢谢!

String page = "<script type='text/javascript'><!--// <![CDATA[\n" +
        "    OA_show('300x250');\n" +
        "// ]]> --></script>\n" +
        "         <script type='text/javascript'>alert(\"Hello\");</script>";

String html = Jsoup.parse(page).select("script").get(0).html();
html = html.replace("<!--// <![CDATA[", "");
html = html.replace("// ]]> -->", "");

System.out.println(html);

Result 结果

OA_show('300x250');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM