[英]How to output list of strings into .csv-file with several columns
我正在尝试构建一个刮板,将所有瑞典国会议员放入具有几列的.csv文件中。
我设法获得了如下所示的名称列表。 我在将字符串分成姓氏,名字和聚会的问题,然后用这三列写入.csv文件时遇到问题,我该怎么做?
码:
source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-
partier/").read()
soup = bs.BeautifulSoup(source, "lxml")
names = soup.find_all("span", {"class": "fellow-name"})
for span in soup.find_all("span", {"class": "fellow-name"}):
cleanednames = span.text.strip()
print(cleanednames)
输出:
Acketoft, Tina (L)
Adaktusson, Lars (KD)
Ahlberg, Ann-Christin (S)
Akhondi, Alireza (C)
Ali-Elmi, Leila (MP)
Alm Ericson, Janine (MP)
...
这是一个使用pandas库编写csv的代码段。 从每个同伴姓名范围中,我们提取姓氏,名字和聚会,并将这三个字符串作为列表追加到列表中。 然后,我们将该列表列表转换为pandas数据框,并将其写入csv。
import urllib
import bs4 as bs
import pandas as pd
source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")
names = soup.find_all("span", {"class": "fellow-name"})
list_of_mps = []
for span in soup.find_all("span", {"class": "fellow-name"}):
cleanednames = span.text.strip()
split_name = cleanednames.split(',')
last_name = split_name[0]
first_name_and_party=split_name[1].strip()
first_name=' '.join(first_name_and_party.split(' ')[:-1])
party=first_name_and_party.split(' ')[-1]
list_of_mps.append([last_name,first_name,party])
pd.DataFrame(list_of_mps,columns = ['last_name','first_name','party']).to_csv('names_parties')
使用显示的输出,可以将其循环添加到csv文件中。
取一个空列表,然后将字段附加到列表中而不是打印。 请参阅以下示例。
data = []
for span in soup.find_all("span", {"class": "fellow-name"}):
cleanednames = span.text.strip()
data.append(cleanednames) #fields are appended to list rather printing
现在,通过列表,您可以提取last_name
, first_name
, party
并将其写入csv文件。 参见下面的示例以写入csv。
with open("result.csv", "w") as stream:
feildnames = ["Last_Name","First_Name","Party"]
var = csv.DictWriter(stream, fieldnames=feildnames)
var.writeheader()
for item in data:
last_name, First_name, party = item.split() #splitting data in 3 fields
last_name = last_name.replace(",","") #removing ',' from last name
party = party.replace("(","").replace(")","") #removing "()" from party
var.writerow({"Last_Name": last_name,"First_Name": First_name, "Party": party}) #writing to csv row
正如前面的评论中提到的那样,熊猫是过度杀伤力的。 改用csv,我们有:
import urllib.request
import bs4 as bs
import csv
source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")
names = soup.find_all("span", {"class": "fellow-name"})
with open("csv-name.csv", 'w') as csv_file:
writer = csv.writer(csv_file)
for span in soup.find_all("span", {"class": "fellow-name"}):
cleanednames = span.text.strip()
lname, rest = cleanednames.split(", ")
rest = rest.split(" ")
party = rest[-1]
fname = " ".join(rest[:-1])
writer.writerow([lname, fname, party])
代码中发生了什么:我们首先用逗号分隔; 逗号前的所有内容均为姓氏。 然后我们按照空间划分,我们知道最后的事情将是聚会。 最后,剩下的就是名字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.