简体   繁体   English

从网站抓取数据并将其写入csv文件时,仅将最后一行写入该文件

[英]When scraping data from website and writing to a csv file, only the last row was written to the file

I am using python and beautiful soup to extract data from a web page and it works. 我正在使用python和漂亮的汤从网页中提取数据,并且可以正常工作。 The problem is that its not inserting all the values to the csv file. 问题是它没有将所有值插入到csv文件中。 Like if I extract 10 data values than only the 10th data value goes to csv file, the 9th one doesn't. 就像如果我提取10个数据值而不是将第10个数据值转到csv文件一样,第9个数据值就没有。 All 10 data values show up on the terminal but not in csv file. 所有10个数据值都显示在终端上,而不显示在csv文件中。

import libraries 导入库

import csv

import urllib.request
from bs4 import BeautifulSoup



# specify the url
quote_page = "https://www.cardekho.com/Hyundai/Gurgaon/cardealers"
#quote_page = input("Enter Data Source Here : ")
page = urllib.request.urlopen(quote_page)



# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "lxml")


# Take out the <div> of name and get its value
delrname = soup.find_all('div', class_='deleadres')
for name in delrname:
    dname = name.find('div', class_="delrname").text # name
    print(dname)
for address in delrname:
    dadres = address.find('p').text
    print(dadres)
for mobile in delrname:
    dmobile = mobile.find('div', class_="clearfix").text
    print(dmobile)
for email in delrname:
    demail = email.find('div', class_="mobno").text
    print(demail)





#exorting data into csv file....
with open('result.csv',newline='') as f:
    r = csv.reader(f)
    data = [line for line in r]
with open('result.csv','w',newline='') as f:
    w = csv.writer(f)
    w.writerow(['NAME','ADDRES','MOBILE','EMAIL'])
    w.writerow([dname,dadres,dmobile,demail])**strong text**

When you assign values in a for-loop, you replace the former value. 在for循环中分配值时,将替换以前的值。 So outside the loop, you will be left with the final value. 因此,在循环之外,您将获得最终值。

for number in 1, 2, 3:
    print(number) # prints 1, then 2, then 3
print(number) # prints only 3, since that was the final value.

In your script, use a single for-loop to both extract values and write data rows to the csv. 在脚本中,使用单个for循环提取值并将数据行写入csv。

with open('result.csv','w',newline='') as f:
    w = csv.writer(f)
    w.writerow(['NAME','ADDRES','MOBILE','EMAIL']) # write header once
    entries = soup.find_all('div', class_='deleadres')
    for entry in entries: # loop over all `.deleadres` elements
        dname = entry.find('div', class_="delrname").text
        dadres = entry.find('p').text
        dmobile = entry.find('div', class_="clearfix").text
        demail = entry.find('div', class_="mobno").text
        w.writerow([dname,dadres,dmobile,demail]) # write data rows for each entry

Your mistake is that you are only saving the last value from the loop, hence you are not getting all the values. 您的错误是您只保存了循环中的最后一个值,因此没有获得所有值。

Another way to do it: 另一种方法是:

1) Add your values from the loop to the list 1)将循环中的值添加到列表中

2) Add values from the list to the CSV 2)将列表中的值添加到CSV

page = urllib.request.urlopen(quote_page)
# CREATE NEW LISTS
dname_list = list()
dadres_list = list()
dmobile_list = list()
demail_list = list()


# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "lxml")

# APPEND TO THE LIST
# Take out the <div> of name and get its value
delrname = soup.find_all('div', class_='deleadres')
for name in delrname:
    dname = name.find('div', class_="delrname").text # name
    print(dname)
    dname_list.append(dname)
for address in delrname:
    dadres = address.find('p').text
    print(dadres)
    dadres_list.append(dadres)
for mobile in delrname:
    dmobile = mobile.find('div', class_="clearfix").text
    print(dmobile)
    dmobile_list.append(dmobile)
for email in delrname:
    demail = email.find('div', class_="mobno").text
    print(demail)
    demail_list.append(demail)


#exorting data into csv file....
with open('result.csv',newline='') as f:
    r = csv.reader(f)
    data = [line for line in r]
with open('result.csv','w',newline='') as f:
    w = csv.writer(f)
    w.writerow(['NAME','ADDRES','MOBILE','EMAIL'])
    # TRAVERSE THROUGH THE LIST
    for i in range(len(dname)):
        try:
            w.writerow([dname_list[i],dadres_list[i],dmobile_list[i],demail_list[i]])
        except IndexError:
            print('')

PS: Haken's answer is a better way of doing it. PS:Haken的答案是一种更好的方法。 I just thought of letting you know another way to do it. 我只是想让您知道另一种方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM