繁体   English   中英

Python csv.writerows()写入到一行中的许多列,而不是按期望/预期写入多行和一列

[英]Python csv.writerows() Writes to Many Columns on One Row and Not to Many Rows and One Column as Desired/Expected

问题和问题是:为什么csv.writerows()输出到一行上的许多列,而不是期望和预期的多行和一列?

详细信息如下:

我需要从各个网站页面收集大量电子邮件,而我没有时间复制/粘贴每封电子邮件。

因此,我使用Python中的一些标准库以及第三方库Beautiful Soup 4开发了HTML网页电子邮件抓取工具。

我开发的脚本连接到网页,或者在这种情况下连接到计算机上的本地文件。

该脚本可以很好地从HTML文件中抓取并收集所有HTML锚标记( <a></a> ),然后将它们编译为锚标记列表。

然后,它使用正则表达式提取电子邮件地址,然后将每个电子邮件地址的两个实例(在定位标记中找到)全部小写,以便我可以将它们组合成一组唯一的电子邮件地址。

然后,我将这组唯一的电子邮件地址转换为电子邮件地址列表,然后使用Python列表对象的sort()方法按字母顺序排列它们。

然后,我将此按字母顺序排列的电子邮件列表转换为按字母顺序排列的电子邮件元组。

然后,我将这组按字母顺序排列的电子邮件添加到仅包含一个项目的列表中(即,写入CSV文件不会将每个电子邮件字符串分成多个在测试中发现的列)。

然后,我将包含元组的列表写入CSV文件,但是writerows()方法仅将它们写入多行的一行。

我只想将每个电子邮件地址字符串写到仅一列的多行中。

谢谢您的帮助。

## IMPORT MODULES
## IMPORT MODULES
## IMPORT MODULES

import urllib
import bs4
import re
import pprint
import csv


## DECLARE VARIABLES
## DECLARE VARIABLES
## DECLARE VARIABLES

## EMPTY LIST FOR SCRAPED E-MAILS
ListOfEmails = []

# EMPTY SET FOR SCRAPED E-MAILS 
SetOfEmails = set()

## HEADERS FOR OUTPUT TO CSV FILE
##headers = ['emails'] 

## ROWS FOR E-MAILS FOR OUTPUT TO CSV FILE
ListWithOneTuple = []


## BEGIN MAIN PROGRAM
## BEGIN MAIN PROGRAM
## BEGIN MAIN PROGRAM

## OPEN LOCAL HTML FILE; READ THE HTML DOCUMENT
file = urllib.request.urlopen("file:///c://Python372/local_venv/index.html")
##print(file)
##print(type(file))
##print("\n")

## PARSE THE HTML; MAKE BEAUTIFUL SOUP
soup = bs4.BeautifulSoup(file, features="html.parser")
##print(soup)
##print(type(soup))
##print("\n")

## FIND ALL <a> ANCHOR TAGS; MAKE LIST OF ANCHOR TAGS
ListOfAnchors = soup.find_all("a")
##pprint.pprint(ListOfAnchors)
##print("\n")
##print("Number of Anchor Tags = ", len(ListOfAnchors))
##print("\n")

## FOR EACH ELEMENT IN LIST OF ANCHORS...
for each in ListOfAnchors:
    ##print(each)

    ## CONVERT EACH BEAUTIFUL SOUP OBJECT INTO STRING
    each = str(each)
    ##print(type(each))

    ## REGEX TO EXTRACT E-MAILS TO LIST
    ListOfMatches = re.findall("([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", each)     
    ##print("ListOfMatches = ", type(ListOfMatches))

    ## FOR EACH ELEMENT IN LIST, MAKE E-MAILS LOWERCASE
    for email in ListOfMatches:

        ## CONVERT E-MAILS TO LOWERCASE
        EmailLowercase = email.lower()
        ##print(EmailLowercase, type(EmailLowercase))
        ##print("\n")

        ## APPEND E-MAILS TO LIST OF E-MAILS
        ListOfEmails.append(EmailLowercase)

## TEST PRINT LIST OF E-MAILS
##print("\n")    
##print("ListOfEmails = ", ListOfEmails)
##print(type(ListOfEmails), len(ListOfEmails))

## CONVERT LIST OF E-MAILS TO SET OF E-MAILS
SetOfEmails = set(ListOfEmails)

## TEST PRINT SET OF E-MAILS
##print("\n") 
##print("SetOfEmails = ", SetOfEmails)
##print(type(SetOfEmails), len(SetOfEmails))

## CONVERT SET OF E-MAILS BACK TO LIST OF E-MAILS FOR NEXT STEP ALPHABETIC SORTING
ListOfEmailsAlphabetic = list(SetOfEmails)

## ALPHABETIZE LIST OF E-MAILS
ListOfEmailsAlphabetic.sort()

## TEST PRINT ALPHABETIC LIST OF E-MAILS
print("\n") 
print(ListOfEmailsAlphabetic, type(ListOfEmailsAlphabetic), len(ListOfEmailsAlphabetic))

## CONVERT ALPHABETIC LIST OF E-MAILS TO TUPLE OF ALPHABETIC E-MAILS    
TupleOfEmailsAlphabetic = tuple(ListOfEmailsAlphabetic)    
print(TupleOfEmailsAlphabetic, type(TupleOfEmailsAlphabetic), len(TupleOfEmailsAlphabetic))

## APPEND TUPLE OF ALPHABETIC E-MAILS TO LIST TO MAKE LIST OF ONE TUPLE ITEM
ListWithOneTuple.append(TupleOfEmailsAlphabetic)

## TEST PRINT ROWS FOR CSV OUTPUT
print("\n")
print(ListWithOneTuple, type(ListWithOneTuple), len(ListWithOneTuple)) 

## OPEN CSV FILE TO OUTPUT LIST OF E-MAILS
with open('CSVofEmails.csv','w', newline='') as CSVFile:
    FileCSV = csv.writer(CSVFile, delimiter=';')
    ##FileCSV.writerow(headers)
    FileCSV.writerows(ListWithOneTuple)



## END MAIN PROGRAM
## END MAIN PROGRAM
## END MAIN PROGRAM

## GAME OVER
## GAME OVER
## GAME OVER

这应该工作。

您可以像这样更改最后一段代码吗?

content = [[i] for i in ListWithOneTuple[0]]

# OPEN CSV FILE TO OUTPUT LIST OF E-MAILS
with open('CSVofEmails.csv', 'w', newline='') as CSVFile:
    FileCSV = csv.writer(CSVFile, delimiter=';')
    # FileCSV.writerow(headers)
    FileCSV.writerows(content)

这可行。 CSV.writerows实际上接受像这样的列表[[column,column],[column,column]],其中外部列表​​是行,内部列表是列。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM