简体   繁体   English

Python BeautifulSoup解决了写入csv时网站上数据丢失的问题

[英]Python BeautifulSoup accounting for missing data on website when writing to csv

I am practicing my web scraping skills on the following website: " http://web.californiacraftbeer.com/Brewery-Member " 我正在以下网站上练习我的网页抓取技巧:“ http://web.californiacraftbeer.com/Brewery-Member

The code I have so far is below. 我到目前为止的代码如下。 I'm able to grab the fields that I want and write the information to CSV, but the information in each row does not match the actual company details. 我可以获取所需的字段并将信息写入CSV,但是每行中的信息与实际的公司详细信息不匹配。 For example, Company A has the contact name for Company D and the phone number for Company E in the same row. 例如,公司A在同一行中具有公司D的联系人姓名和公司E的电话号码。

Since some data does not exist for certain companies, how can I account for this when writing rows that should be separated per company to CSV? 由于某些公司不存在某些数据,因此当将应按公司分开的行写入CSV时,该如何处理? What is the best way to make sure that I am grabbing the correct information for the correct companies when writing to CSV? 确保在写入CSV时能为正确的公司获取正确的信息的最佳方法是什么?

"""
Grabs brewery name, contact person, phone number, website address, and email address 
for each brewery listed.
"""    

import requests, csv
from bs4 import BeautifulSoup    

url = "http://web.californiacraftbeer.com/Brewery-Member"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
company_name = soup.find_all(itemprop="name")
contact_name = soup.find_all("div", {"class": "ListingResults_Level3_MAINCONTACT"})
phone_number = soup.find_all("div", {"class": "ListingResults_Level3_PHONE1"})
website = soup.find_all("span", {"class": "ListingResults_Level3_VISITSITE"})    

def scraper():
    """Grabs information and writes to CSV"""
    print("Running...")
    results = []
    count = 0
    for company, name, number, site in zip(company_name, contact_name, phone_number, website):
        print("Grabbing {0} ({1})...".format(company.text, count))
        count += 1
        newrow = []
        try:
            newrow.append(company.text)
            newrow.append(name.text)
            newrow.append(number.text)
            newrow.append(site.find('a')['href'])
        except Exception as e: 
            error_msg = "Error on {0}-{1}".format(number.text,e) 
            newrow.append(error_msg)
        results.append(newrow)
    print("Done")
    outFile = open("brewery.csv","w")
    out = csv.writer(outFile, delimiter=',',quoting=csv.QUOTE_ALL, lineterminator='\n')
    out.writerows(results)
    outFile.close()

def main():
    """Runs web scraper"""
    scraper()    

if __name__ == '__main__':
    main()

Any help is very much appreciated! 很感谢任何形式的帮助!

您需要使用一个zip来同时遍历所有这些数组:

for company, name, number, site in zip(company_name, contact_name, phone_number, website):

Thanks for the help. 谢谢您的帮助。

I realized that since the company details for each company are contained in the Div class "ListingResults_All_CONTAINER ListingResults_Level3_CONTAINER", I could write a nested for-loop that iterates through each of these Divs and then grabs the information I want within the Div. 我意识到,由于每个公司的公司详细信息都包含在Div类“ ListingResults_All_CONTAINER ListingResults_Level3_CONTAINER”中,因此我可以编写一个嵌套的for循环,遍历每个Divs,然后在Div中获取我想要的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Python和BeautifulSoup抓取网站数据后,CSV无法正确写入 - CSV not writing properly after scraping data for a website using Python and BeautifulSoup Python 将数据写入从网站抓取的 CSV 时出错 - Python error when writing data to CSV scraped from website Python BeautifulSoup 在 CSV 中写入 1 行 - Python BeautifulSoup writing 1 line in CSV 使用 Python 中的 Beautifulsoup 从网站抓取数据并将其放入 Z251D2BBFE9A3B95E5691CEB30DC6784EBAZ ZBA834BA059A175A3798E4Z9C1 时,某些单元格中的值缺失 - Missing values in certain cells when scraping data from website using Beautifulsoup in Python and placing it in Pandas DataFrame 使用python和beautifulsoup从网站数据抓取到csv文件格式 - Data Scrape from a website to a csv file format using python and beautifulsoup python beautifulsoup并写入CSV(多个URL) - python beautifulsoup and writing to CSV (multiple URLs) 使用Python / BeautifulSoup解析后未写入CSV - Not writing to CSV after parsing with Python/BeautifulSoup 将新闻写入 CSV 文件(Python 3,BeautifulSoup) - Writing the news to CSV-file (Python 3, BeautifulSoup) 在csv文件中写入数据时发生python错误 - python error when writing data in csv file 使用 Python 中的 BeautifulSoup 在网站上显示缺失的标签 - Revealing missing tags on a website using BeautifulSoup in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM