简体   繁体   中英

Web scraping but not scraping changes

Trying to monitor changes on this page: at5.nl/zoek/pijp . "pijp" is a query keyword here. It shows a list of articles with the latest on top:

[enter image description here][1] When I scrape this page with curl or wget (example attached) I don't see any changes in the resulting file over time or using different keywords. Examining the content of the file (obviously) there's nothing related to the content I see in my browser. Coming across a lot of javascript. My first goal is just to see if something changes in browser output from a script. The script checks this every 5 minutes and then sends an @mail in case of changes.

As you might have guessed I am definitely no web developer. Any pointers as to how I could scrape my desired changes? (Fairly proficient with bash)

Here's a link to the file I get with cURL:

https://drive.google.com/file/d/1-QzoTgbqc_m96YOx6qBh1eIBDyD5HfW_/view?usp=sharing

As @James pointed out, you could use the API-url and parse the resulting JSON to your liking. The JSON-parser can help you out:

$ xidel -s \
  -d '{{"searchTerm":"pijp"}}' \
  "https://ditisdesupercooleappapi.at5.nl/api/search" \
  -e '$json/(articles)()[created gt (current-dateTime() - dateTime("1970-01-01T00:05:00Z")) div dayTimeDuration("PT1S")]'

"pijp" (as a value in a JSON object) is sent (POST-request) to the API-url, after which the resulting JSON is parsed in such a way that it will only return those articles that have a created attribute whose value (an Epoch timestamp ) is only 5 minutes old.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM