简体   繁体   English

在一列中的csv writer输出

[英]csv writer output in one column

I have parsed some txt files and obtain the following list: 我已经解析了一些txt文件并获得以下列表:

price = ['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388', 'price to be between $and $per ', 'S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015', 'S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944', 'initial public offering and no public market exists for our shares. We anticipate that the initial public offering price will be between $and', 'S-1', '20110523', '\t\t\t\tCeres, Inc.', '\t\t0000767884', '    aggregate capital expenditures will be between $0.3&#160;million', 'S-1', '20171023', '\t\t\t\tBLUEGREEN VACATIONS CORP', '\t\t0000778946', '        <div style="margin-top:14pt; text-align:justify; line-height:12pt;">This is the initial public offering of Bluegreen Vacations Corporation. We are offering &#8194;&#8194; shares of our common stock and the selling shareholder identified in this prospectus is offering &#8194;&#8194; shares of our common stock. We will not receive any of the proceeds from the sale of shares by the selling shareholder. We anticipate that the initial public offering price of our common stock will be between $&#8199;&#8199; and $&#8199;&#8199; per ', 'S-1', '20020813', '\t\t\t\tVISTACARE INC', '\t\t0000787030']

My desired output is a csv file where each row starts with each " S-1 " document (corresponding to a different company). 我想要的输出是一个csv文件,其中每一行都以每个“ S-1 ”文档(对应于不同的公司)开头。 So I wrote a second list that creates sublists of the above starting in every 'S-1' : 因此,我编写了第二个列表,从每个'S-1'开始创建上述列表的子列表:

price2 = [s.strip('|').split('|') for s in re.split(r'(?=S-1)', '|'.join(price)) if s]
print(price2)
[['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388', 'price to be between $and $per '], ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'], ['S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944', 'initial public offering and no public market exists for our shares. We anticipate that the initial public offering price will be between $and'], ['S-1', '20110523', '\t\t\t\tCeres, Inc.', '\t\t0000767884', '    aggregate capital expenditures will be between $0.3&#160;million'], ['S-1', '20171023', '\t\t\t\tBLUEGREEN VACATIONS CORP', '\t\t0000778946', '        <div style="margin-top:14pt; text-align:justify; line-height:12pt;">This is the initial public offering of Bluegreen Vacations Corporation. We are offering &#8194;&#8194; shares of our common stock and the selling shareholder identified in this prospectus is offering &#8194;&#8194; shares of our common stock. We will not receive any of the proceeds from the sale of shares by the selling shareholder. We anticipate that the initial public offering price of our common stock will be between $&#8199;&#8199; and $&#8199;&#8199; per '], ['S-1', '20020813', '\t\t\t\tVISTACARE INC', '\t\t0000787030']]

To which I then write on a csv file: 然后在csv文件上写入:

with open('pricerange.csv', 'w') as out_file:
    wr = csv.writer(out_file)
    wr.writerow(["file_form", "filedate", "coname", "cik", "price_range"])  # Headlines in  top row
    wr.writerows(price2)

The output looks fine, with each sublist being placed in a new row (ie each row starts with the 'S-1' element). 输出看起来很好,每个子列表都放置在新行中(即,每行以'S-1'元素开头)。 在此处输入图片说明

To clean even further the list, I still want to remove the special characters (eg '&#8194' ). 为了进一步清理列表,我仍然想删除特殊字符(例如'&#8194' )。 So I create a new price3 list: 因此,我创建了一个新的price3清单:

price3 = re.sub('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});', '', str(price2)) #remove special characters or html tags in original .txt files
print(price3)
[['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388', 'price to be between $and $per '], ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'], ['S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944', 'initial public offering and no public market exists for our shares. We anticipate that the initial public offering price will be between $and'], ['S-1', '20110523', '\t\t\t\tCeres, Inc.', '\t\t0000767884', '    aggregate capital expenditures will be between $0.3million'], ['S-1', '20171023', '\t\t\t\tBLUEGREEN VACATIONS CORP', '\t\t0000778946', '        This is the initial public offering of Bluegreen Vacations Corporation. We are offering  shares of our common stock and the selling shareholder identified in this prospectus is offering  shares of our common stock. We will not receive any of the proceeds from the sale of shares by the selling shareholder. We anticipate that the initial public offering price of our common stock will be between $ and $ per '], ['S-1', '20020813', '\t\t\t\tVISTACARE INC', '\t\t0000787030']]

My surprise is that when I apply the code to transfer price3 into a csv file, all elements are kept within the first column. 令我惊讶的是,当我应用代码将price3传输到一个csv文件中时,所有元素都保留在第一列中。 See output: 查看输出:

在此处输入图片说明

Any suggestions? 有什么建议么? I can't see where's the bug... Thank you so much 我看不出错误在哪里...非常感谢

No bugs, Excel by default uses the ' ; 没有错误,Excel默认使用' ; ' instead of the ' , ', then in your example it inserts all the values ​​in the first column. 用'代替' , ',然后在您的示例中将所有值插入第一列。 To correctly view the csv, you have to change the excel settings the separator character from ' ; 要正确查看csv,您必须将excel设置更改为' ;分隔符; ' a ' , ' or save your csv file with the delimiter ' ; 'a' , '或使用定界符'保存您的csv文件; ', as follows: ', 如下:

with open('pricerange.csv', 'w') as out_file:
        wr = csv.writer(out_file, delimiter=";")
        wr.writerow(["file_form", "filedate", "coname", "cik", "price_range"])  # Headlines in  top row
        wr.writerows(price2)

There is no bug, the problem is that the type(price) is list and the type(price3) is string. 没有错误,问题在于type(price)是列表, type(price) type(price3)是字符串。 When trying to write to file, the string is interpreted as a list of characters, so the code writes one character per line and gets the photo output: 尝试写入文件时,该字符串被解释为字符列表,因此代码每行写入一个字符并获取照片输出:

list(price3)

['[',
 '[',
 "'",
 'S',
 '-',
 '1',
 "'",
 ',',
 ' ',
...

You must then transform the string price3 in the corresponding list before writing the csv file. 然后,您必须在写入csv文件之前在相应的列表中转换字符串price3 To do this you can use this trick: 为此,您可以使用以下技巧:

import ast
price3_str = re.sub('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});', '', str(price2)) #remove special characters or html tags in original .txt files
price3 = ast.literal_eval(price3_str)

Now you can create the csv: 现在您可以创建csv了:

import csv
with open('pricerange3.csv', 'w') as out_file:
        wr = csv.writer(out_file, delimiter=";")
        wr.writerow(["file_form", "filedate", "coname", "cik", "price_range"])  # Headlines in  top row
        wr.writerows(price3)

You have problem with price3 because you converted price2 to string to use re.sub() and later writerows() has problem to write it because it needs list of rows but it gets only single string. 您对price3问题,因为您将price2转换为字符串以使用re.sub() ,后来, writerows()writerows()问题,因为它需要行列表,但只能获取单个字符串。 And it treads string as list of chars and put every char in separated row. 并且将字符串作为char的列表,并将每个char放在单独的行中。

You should use list comprehension to run re with every element on list separatelly. 您应该使用列表解析运行re与separatelly名单上的每一个元素。

EDIT: As Massifox noticed in comment original version didn't work correctly but I added internal for -loop and now it works correctly. 编辑:正如Massifox在评论中注意到,原始版本无法正常工作,但我for -loop添加了内部for ,现在它可以正常工作。

price3 = [[re.sub('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});', '', item) for item in row] for row in price2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM