简体   繁体   中英

Python code printing only one row in csv file

Recently I've tried to code a yp.com list scraper. But could not figure out why the code is printing only one row in the.csv file.

yp_urls.txt urls are:

https://www.yellowpages.com/search-map?search_terms=restaurant&geo_location_terms=Boston https://www.yellowpages.com/search-map?search_terms=restaurant&geo_location_terms=Boston&page=2

Here is the code:

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
with open('yp_urls.txt', 'r') as f:
    for url in f:
        print(url)        
        uClient = urlopen(url)
        page_html = uClient.read()        
        uClient.close()
        page_soup = soup(page_html, "html.parser")
        containers = page_soup.findAll("div",{"class":"v-card"})
        #container= containers[0]
        out_filename = "yp_listing.csv"
        headers = "URL \n"
        f = open(out_filename, "w")
f.write(headers)
for container in containers:
            business = container.a["href"].title()
print("business:" + business + "\n" )
f.write(business + "," + "\n")
f.close()  # Close the file

Issues:

  1. Code for your if blocks wasn't properly indented.

  2. Open output file handle outside the for loop.

Try:

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

out_filename = "yp_listing.csv"
with open('yp_urls.txt', 'r') as f, open(out_filename, "w") as fout:
    headers = "URL \n"
    fout.write(headers)

    for url in f:
        print(url)        
        uClient = urlopen(url)
        page_html = uClient.read()        
        uClient.close()
        page_soup = soup(page_html, "html.parser")
        containers = page_soup.findAll("div",{"class":"v-card"})
        #container= containers[0]
        for container in containers:
            business = container.a["href"].title()
            print("business:" + business + "\n" )
            fout.write(business + "," + "\n")
#f.close()  # Close the file (closed by with)

It appears that the f.write commands are outside of your loops, so are only being hit once the loops are completed.

For example, the code loops through the urls, then exits the loop and executes f.write(headers), then loops through containers, exits that loop and f.write(business:..)

You may also wish to check if the output file is being opened in right state with 'w' (write/overwrite) versus 'a' (append). Perhaps also consider changing the handles so both are not 'f'.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM