简体   繁体   English

将 selenium 抓取的数据导出到 CSV 文件?

[英]Exporting selenium scraped data to CSV file?

In the following script, I scraped the coronavirus data from a table on worldometers.info/coronavirua with selenium.在以下脚本中,我使用 selenium 从 worldometers.info/coronavirua 上的表格中抓取了冠状病毒数据。

from time import sleep
from selenium import webdriver

class CoronaBot():
def __init__(self):
    self.driver = webdriver.Chrome()

def scraper(self):
    self.driver.get('https://worldometers.info/coronavirus/')
    main_table = self.driver.find_element_by_xpath('//*[@id="main_table_countries_today"]')
    country = main_table.find_element_by_xpath("//td[contains(., 'Austria')]")
    row = country.find_element_by_xpath("./..")
    data = row.text.split(" ")
    total_cases = data[0]
    new_cases = data[1]
    total_deaths = data[2]
    new_deaths = data[3]
    active_cases = data[4]
    total_recovered = data[5]
    serious_critical = data[6]

The code works fine, I could just print it out like this:代码工作正常,我可以像这样打印出来:

    print("COVID-19 updates in: " + country.text)
    print("Total Cases: " + total_cases)
    ...

However, I want to take the output of the scraped results and place it in a new csv file (the csv file needs to be created when the script is run.)但是,我想将抓取结果的 output 放在一个新的 csv 文件中(运行脚本时需要创建 csv 文件)。

I tried something stupid like this in panda, but it obviously did not work.我在熊猫中尝试过类似这样的愚蠢方法,但显然没有用。 Any suggestion?有什么建议吗?

def create_csv(self):

    collected_data = []

    collected_data.append(output)

    df = pd.DataFrame(collected_data, columns=['total_cases', 'new_cases', 'total_deaths', 
    'new_deaths', 'active_cases', 'total_recovered','serious_critical'])
    df.to_csv('scraped_corona.csv')

Pandas is a great solution, you were close. Pandas 是一个很好的解决方案,你很接近。 In your example, you can just use the scraper function to put data in the data frame right away.在您的示例中,您可以使用刮板 function 立即将数据放入数据框中。

At first, I'd create self.df attribute to store the data frame:首先,我会创建self.df属性来存储数据框:

class CoronaBot():
    def __init__(self):
        self.driver = webdriver.Chrome()
        column_names = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths','active_cases', 'total_recovered', 'serious_critical']
        self.df = pd.DataFrame(columns=column_names)

Then, after you collect the data, store it in self.df :然后,收集数据后,将其存储在self.df中:

...
print("Total recovered: " + total_recovered)
print("Serious, critical cases: " + serious_critical)

self.df = self.df.append(
    {'total_cases': total_cases,
     'new_cases': new_cases,
     'total_deaths': total_deaths,
     'new_deaths': new_deaths,
     'active_cases': active_cases,
     'total_recovered': total_recovered,
     'serious_critical': serious_critical}, ignore_index=True)

And add a function for exporting:并添加一个 function 用于导出:

    def export_to_csv(self):
        self.df.to_csv('scraped_corona.csv')

Now, when I run现在,当我跑步时

c = CoronaBot()
c.scraper()
c.export_to_csv()

I get the.csv file.我得到了.csv 文件。 Hope it helps, good luck!希望对你有帮助,祝你好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM