I need to loop over URLs stored in a CSV file. I want to extract phones and ZIPs from the URLs listed.
Please if you can help me, I appreciate!
# read csv with just url per line
with open('urls.csv') as file:
start_urls = [line.strip() for line in file]
def start_request(self):
request = Request(url = self.start_urls, callback=self.parse)
yield request
def parse(self, response):
html = response.body
soup = BeautifulSoup(html, 'lxml')
text = soup.get_text()
phone = re.findall(r'\d{3}-\d{3}-\d{4}', html, re.MULTILINE)
zipcode = re.findall(r'(?<=, [A-Z]{2} )\d{5}', html, re.MULTILINE)
phn_1 = []
zipcode_1 = []
´´´
You described your goal but didn't mention what part is currently not working.
You wrote this:
def start_request(self):
request = Request(url=self.start_urls, callback=self.parse)
yield request
It isn't obvious that that's what you want. In particular I would expect Request() to accept a single url rather than a list. Also, using a callback is fine but perhaps fancier than needed. Try this simplified approach:
for url in start_urls:
self.parse(Request(url=url))
I'm sure this expression works fine for you: [line.strip() for line in file]
. To emphasize that it is all about dealing with newlines, it would be clearer to use
line.rstrip()
instead of
line.strip()
Thanks for the answer! I can looping but I'm not able of getting the phones and ZIPs while I´m looping for get after an CSV with the data. Any help I would appreciate!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.