简体   繁体   English

从jsp提取JSON作为字符串

[英]Extract JSON as String from jsp

I am working on the parsing a website view-source: https://massive.ucsd.edu/ProteoSAFe/datasets.jsp . 我正在解析网站视图源: https : //massive.ucsd.edu/ProteoSAFe/datasets.jsp I want to parse the .jsp and extract the JSOn object from the same. 我想解析.jsp并从中提取JSOn对象。

I am using Jsoup to extract the data 我正在使用Jsoup提取数据

Document doc = Jsoup.connect("https://massive.ucsd.edu/ProteoSAFe/datasets.jsp").maxBodySize(0).get();

Then using Java pattern to extract Json as string: 然后使用Java模式将Json提取为字符串:

Pattern p = Pattern.compile(String.format("\"%s\":\\s*(.*),", "dataset","\"%s\":\\s*(.*),", "datasetNum","\"%s\":\\s*(.*),", "title","\"%s\":\\s*(.*),", "user","\"%s\":\\s*(.*),", "site","\"%s\":\\s*(.*),", "flowname","\"%s\":\\s*(.*),", "createdMillis","\"%s\":\\s*(.*),", "created","\"%s\":\\s*(.*),", "fileCount","\"%s\":\\s*(.*),", "fileSizeKB","\"%s\":\\s*(.*),", "psms","\"%s\":\\s*(.*),", "peptides","\"%s\":\\s*(.*),", "variants","\"%s\":\\s*(.*),", "proteins","\"%s\":\\s*(.*),", "species","\"%s\":\\s*(.*),", "instrument","\"%s\":\\s*(.*),", "modification","\"%s\":\\s*(.*),", "pi","\"%s\":\\s*(.*),", "complete","\"%s\":\\s*(.*),", "status","\"%s\":\\s*(.*),", "private","\"%s\":\\s*(.*),", "hash","\"%s\":\\s*(.*),", "px","\"%s\":\\s*(.*),", "task","\"%s\":\\s*(.*),", "id"));

Matcher m = p.matcher(script.html());

While doing so I am getting error. 这样做时,我得到了错误。 Last line is not getting parsed correctly. 最后一行未正确解析。 It cuts in the end so I get 最终削减,所以我得到

'A JSONObject text must end with '}' at character 577' error. “ JSONObject文本必须在字符577”处以“}”结尾。

Can anyone suggest me better way to parse this page to get data. 任何人都可以建议我更好的方法来解析此页面以获取数据。

While it seems like a bad idea to parse any HTML with regex. 用正则表达式解析任何HTML似乎是一个坏主意。

This works for me Pattern.compile("(?s)var datasets = (\\\\[.*?\\\\]);") 这对我Pattern.compile("(?s)var datasets = (\\\\[.*?\\\\]);")

(Tested via Python, since that's all I have available). (通过Python测试,因为这就是我所能提供的全部)。

And that returns a JSONArray , not a JSONObject . 然后返回JSONArray ,而不是JSONObject

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM