[英]Why i am getting only the last output in csv?
我在 csv 文件中有一堆 url,我必須將這些 url 中的數據提取到另一個 csv 文件中。 I extracted the data from those urls into a dataframe using my code below, but when it comes to save those extracted data into output csv, it only shows me the last extracted data (ie if I have 10 urls in demo.csv, only the在 output csv 中可以看到第 10 個 url 的提取數據,而不是所有 url 的數據)。
import csv
import time
import requests
from bs4 import BeautifulSoup
import pandas as pd
with open('demo.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
url = row[0]
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqHiSuL-56"}
response = requests.get(url, headers= header)
print(url)
soup = BeautifulSoup(response.content, "html.parser")
website= soup.find('div', class_="arrange__373c0__UHqhV gutter-2__373c0__3Zpeq vertical-align-middle__373c0__2TQsQ border-color--default__373c0__2oFDT")
if website is None:
website = '-'
else:
website = website.text.replace('Business website','')
print(website)
time.sleep(2)
dict = {'url': [url], 'website': [website]}
df = pd.DataFrame(dict)
df.to_csv('export_dataframe.csv', index= False)
問題似乎是您將數據添加到字典中的行的標識。 它在循環之外,因此只添加最后的 url 數據。 我在下面的代碼中通過注釋指出了這一點。
import csv
import time
import requests
from bs4 import BeautifulSoup
import pandas as pd
data = []
with open('demo.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
url = row[0]
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqHiSuL-56"}
response = requests.get(url, headers= header)
print(url)
soup = BeautifulSoup(response.content, "html.parser")
website= soup.find('div', class_="arrange__373c0__UHqhV gutter-2__373c0__3Zpeq vertical-align-middle__373c0__2TQsQ border-color--default__373c0__2oFDT")
if website is None:
website = '-'
else:
website = website.text.replace('Business website','')
print(website)
time.sleep(2)
data.append([url, website]) # this line is out of loop in your code, also I am using list here just to simplify (you can use dict still)
df = pd.DataFrame(data, columns=['url','website'])
df.to_csv('export_dataframe.csv', index= False)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.