[英]CSV Writer only writing first line in file
So I have patent data I wish to store from an XML to a CSV file. 因此,我拥有希望从XML存储到CSV文件的专利数据。 I've been able to run my code through each iteration of the invention name, date, country, and patent number, but when I try to write the results into a CSV file something goes wrong.
我已经能够在发明名称,日期,国家和专利号的每次迭代中运行我的代码,但是当我尝试将结果写入CSV文件时,出现了问题。
The XML data looks like this (for one section of many): XML数据看起来像这样(很多部分中的一部分):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<us-patent-grant lang="EN" dtd-version="v4.2 2006-08-23" file="USD0584026-20090106.XML" status="PRODUCTION" id="us-patent-grant" country="US" date-produced="20081222" date-publ="20090106">
<us-bibliographic-data-grant>
<publication-reference>
<document-id>
<country>US</country>
<doc-number>D0584026</doc-number>
<kind>S1</kind>
<date>20090106</date>
</document-id>
</publication-reference>
My code for running through and writing these lines one-by-one is: 我用于逐行编写这些行的代码是:
for xml_string in separated_xml(infile): # Calls the output of the separated and read file to parse the data
soup = BeautifulSoup(xml_string, "lxml") # BeautifulSoup parses the data strings where the XML is converted to Unicode
pub_ref = soup.findAll("publication-reference") # Beginning parsing at every instance of a publication
lst = [] # Creating empty list to append into
for info in pub_ref: # Looping over all instances of publication
# The final loop finds every instance of invention name, patent number, date, and country to print and append into
with open('./output.csv', 'wb') as f:
writer = csv.writer(f, dialect = 'excel')
for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
#print(inv_name.text, pat_num.text, date_num.text, country.text)
#lst.append((inv_name.text, pat_num.text, date_num.text, country.text))
writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])
And lastly, the output in my .csv file is this: 最后,我的.csv文件中的输出是这样的:
"Content addressable information encapsulation, representation, and transfer",07475432,20090106,US
I'm unsure where the issue lies and I know I'm still quite a newbie at Python but can anyone find the problem? 我不确定问题出在哪里,我知道我仍然是Python的新手,但是有人可以找到问题吗?
The problem lies in this line with open('./output.csv', 'wb') as f:
问题在于
with open('./output.csv', 'wb') as f:
这一行with open('./output.csv', 'wb') as f:
If you want to write all rows into a single file, use mode a
. 如果要将所有行写入单个文件,请使用模式
a
。 Using wb
will overwrite the file and thus you are only getting the last line. 使用
wb
将覆盖文件,因此您只会得到最后一行。
Read more about the file mode here: https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files 在此处阅读有关文件模式的更多信息: https : //docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
You open the file in overwrite mode ( 'wb'
) inside a loop. 您可以在循环内以覆盖模式(
'wb'
)打开文件。 On each iteration you erase what could have been previously written. 在每次迭代中,您将擦除以前可能写的内容。 The correct way is to open the file outside the loop:
正确的方法是在循环外打开文件:
...
with open('./output.csv', 'wb') as f:
writer = csv.writer(f, dialect = 'excel')
for info in pub_ref: # Looping over all instances of publication
# The final loop finds every instance of invention name, patent number, date, and country to print and append into
for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.