简体   繁体   中英

Scraping Information from multiple URLS that are different in structure

I would like to scrape multiple URLS but they are of different nature, such as different company websites with different html backend. Is there a way to do it without coming up with a customised code for each url?

Understand that I can put multiple URLS into a list and loop them

I fear not, but I am not an expert:-)

I could imagine that it depends on the complexity of the structures. If you want to find a the text "Test" on every website, I coul imagine that soup.body.findAll(text='Test') would return all occurences of "Test" on the website.

I assume you're aware of how to loop through a list here, so that you'd loop through the list of URLS and for each check whether the searched string occurs (maybe you are looking for sth else, ie an "apply" button or "login"?

all the best,

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM