簡體   English   中英

為什么我在 csv 中只得到最后一個 output?

[英]Why i am getting only the last output in csv?

我在 csv 文件中有一堆 url,我必須將這些 url 中的數據提取到另一個 csv 文件中。 I extracted the data from those urls into a dataframe using my code below, but when it comes to save those extracted data into output csv, it only shows me the last extracted data (ie if I have 10 urls in demo.csv, only the在 output csv 中可以看到第 10 個 url 的提取數據,而不是所有 url 的數據)。

import csv
import time
import requests
from bs4 import BeautifulSoup
import pandas as pd

with open('demo.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        url = row[0]
        header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqHiSuL-56"}
        response = requests.get(url, headers= header)
        print(url)
        soup = BeautifulSoup(response.content, "html.parser")
        website= soup.find('div', class_="arrange__373c0__UHqhV gutter-2__373c0__3Zpeq vertical-align-middle__373c0__2TQsQ border-color--default__373c0__2oFDT")
        if website is None:
            website = '-'
        else:
            website = website.text.replace('Business website','')
            print(website)
        time.sleep(2)

    dict = {'url': [url], 'website': [website]}
    df = pd.DataFrame(dict)
    df.to_csv('export_dataframe.csv', index= False)

問題似乎是您將數據添加到字典中的行的標識。 它在循環之外,因此只添加最后的 url 數據。 我在下面的代碼中通過注釋指出了這一點。

import csv
import time
import requests
from bs4 import BeautifulSoup
import pandas as pd

data = []
with open('demo.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        url = row[0]
        header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqHiSuL-56"}
        response = requests.get(url, headers= header)
        print(url)
        soup = BeautifulSoup(response.content, "html.parser")
        website= soup.find('div', class_="arrange__373c0__UHqhV gutter-2__373c0__3Zpeq vertical-align-middle__373c0__2TQsQ border-color--default__373c0__2oFDT")
        if website is None:
            website = '-'
        else:
            website = website.text.replace('Business website','')
            print(website)
        time.sleep(2)

        data.append([url, website])   # this line is out of loop in your code, also I am using list here just to simplify (you can use dict still)
    df = pd.DataFrame(data, columns=['url','website'])
    df.to_csv('export_dataframe.csv', index= False)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM