繁体   English   中英

如何将字符串列表输出到具有多列的.csv文件

[英]How to output list of strings into .csv-file with several columns

我正在尝试构建一个刮板,将所有瑞典国会议员放入具有几列的.csv文件中。

我设法获得了如下所示的名称列表。 我在将字符串分成姓氏,名字和聚会的问题,然后用这三列写入.csv文件时遇到问题,我该怎么做?

码:

source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter- 
partier/").read()
soup = bs.BeautifulSoup(source, "lxml")

names = soup.find_all("span", {"class": "fellow-name"})

for span in soup.find_all("span", {"class": "fellow-name"}):
    cleanednames = span.text.strip()
    print(cleanednames)

输出:

Acketoft, Tina (L)
Adaktusson, Lars (KD)
Ahlberg, Ann-Christin (S)
Akhondi, Alireza (C)
Ali-Elmi, Leila (MP)
Alm Ericson, Janine (MP)
...

这是一个使用pandas库编写csv的代码段。 从每个同伴姓名范围中,我们提取姓氏,名字和聚会,并将这三个字符串作为列表追加到列表中。 然后,我们将该列表列表转换为pandas数据框,并将其写入csv。

import urllib
import bs4 as bs 
import pandas as pd
source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")

names = soup.find_all("span", {"class": "fellow-name"})

list_of_mps = []

for span in soup.find_all("span", {"class": "fellow-name"}):
    cleanednames = span.text.strip()
    split_name = cleanednames.split(',')
    last_name = split_name[0]
    first_name_and_party=split_name[1].strip()
    first_name=' '.join(first_name_and_party.split(' ')[:-1])
    party=first_name_and_party.split(' ')[-1]
    list_of_mps.append([last_name,first_name,party])
pd.DataFrame(list_of_mps,columns = ['last_name','first_name','party']).to_csv('names_parties')

使用显示的输出,可以将其循环添加到csv文件中。

取一个空列表,然后将字段附加到列表中而不是打印。 请参阅以下示例。

data = []

for span in soup.find_all("span", {"class": "fellow-name"}):
    cleanednames = span.text.strip()
    data.append(cleanednames)  #fields are appended to list rather printing

现在,通过列表,您可以提取last_namefirst_nameparty并将其写入csv文件。 参见下面的示例以写入csv。

with open("result.csv", "w") as stream:
    feildnames = ["Last_Name","First_Name","Party"]
    var = csv.DictWriter(stream, fieldnames=feildnames)
    var.writeheader()
    for item in data:
        last_name, First_name, party = item.split()  #splitting data in 3 fields
        last_name = last_name.replace(",","")  #removing ',' from last name
        party = party.replace("(","").replace(")","")  #removing "()" from party
        var.writerow({"Last_Name": last_name,"First_Name": First_name, "Party": party})  #writing to csv row

正如前面的评论中提到的那样,熊猫是过度杀伤力的。 改用csv,我们有:

import urllib.request
import bs4 as bs
import csv

source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")

names = soup.find_all("span", {"class": "fellow-name"})
with open("csv-name.csv", 'w') as csv_file:
    writer = csv.writer(csv_file)
    for span in soup.find_all("span", {"class": "fellow-name"}):
        cleanednames = span.text.strip()
        lname, rest = cleanednames.split(", ")
        rest = rest.split(" ")
        party = rest[-1]
        fname = " ".join(rest[:-1])
        writer.writerow([lname, fname, party])

代码中发生了什么:我们首先用逗号分隔; 逗号前的所有内容均为姓氏。 然后我们按照空间划分,我们知道最后的事情将是聚会。 最后,剩下的就是名字。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM