简体   繁体   中英

JSOUP Scraping JavaScript piece Java

I am using Jsoup to scrap some data. In my document, I have something like:

  <script type="text/javascript">
ta.store('mapsv2.geoName', 'Marseille');
ta.store('mapsv2.map_addressnotfound', 'Address not found');         ta.store('mapsv2.map_addressnotfound3', 'We couldn\'t find that location near {0}.  Please try another search.');       </script> 
  <script type="text/javascript">
window.mapDivId = 'map0Div';
window.map0Div = {
lat: 43.295246,
lng: 5.364188,
zoom: null,
locId: 5039388,
geoId: 187253,

My code:

   Document attractionDoc = Jsoup.connect(url).timeout(100000).get();
   System.out.println("attractionDoc "+attractionDoc.toString());

But I don't know how to catch the number after lat: and lng:

Thanks for your help!

JSoup does not parse embedded Javascript, so there is no easy way of getting the object members lat and lng from the window.map0Div object.

But as indicated by @Ceiling Gecko, you can parse the contents of the script tag with other techniques, eg regular expressions.

Assuming you have the script contents as a String called content you may use something like:

Pattern p = Pattern.compile("window.map0Div\\s*=\\s*\\{.*lat:\\s*([0-9.]+),.*lng:\\s*([0-9.]+),");
Matcher m = p.matcher(content);
if (m.find()){
    String lat = m.group(1);
    String lng = m.group(2);
    //do whatever you need to do with the info
}

Here is a fiddle with the regex: http://fiddle.re/1p0yd6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM