[英]How to transform a Python tuple to a .csv file?
I would like to transform a Python tuple to a .csv
file. 我想将Python元组转换为
.csv
文件。 Let's say I have a retrive() function and when I print it with pprint
it looks like this: 假设我有一个retrive()函数,当我用
pprint
打印它时,它看起来像这样:
test = tuple(retrive(directory))
pprint(test, width=1)
Then: 然后:
("opinion_1.txt, I am an amateur photographer and own three DSLR c.... purchase",
"opinion_2.txt, This my second Sony Digital Came.... good camera for a good price!',
'opinion_3.txt, \'I ordered this camera with high hopes after couldn\\\'t find.\'')
So, I tried this with the csv
module: 因此,我尝试使用
csv
模块:
with open('/Users/user/Downloads/output.csv','w') as out:
csv_out=csv.writer(out)
csv_out.writerow(['id','content'])
for row in test:
csv_out.writerow(row)
The problem is that I get a weird output which looks like this: 问题是我得到一个奇怪的输出,看起来像这样:
id,content
o,p,i,n,i,o,n,_,1,.,t,x,t,",", ,I, ,a,m, ,a,n, ,a,m,a,t,e,u,r, ,p,h,o,t,o,g,r,a,p,h,e,r, ,a,n,d, ,o,w,n, ,t,h,r,e,e, ,D,S,L,R, ,c,a,m,e,r,a,s, ,w,i,t,h, ,a, ,s,e,l,e,c,t,i,o,n, ,o,f, ,l,e,n,s,e,s,., ,H,o,w,e,v,e,r, ,t,h,a,t, ,c,o,l,l,e,c,t,i,o,n,
How can I get something like this: 我如何获得这样的东西:
opinion_1.txt,I am an amateur photographer and own three DSLR c.... purchase
opinion_2.txt,This my second Sony Digital Came.... good camera for a good price!
opinion_3.txt,I ordered this camera with high hopes after couldn\\\'t find.
CSV trying to iterate over string you pass from the tuple. CSV尝试遍历从元组传递的字符串。 Change your code to:
将您的代码更改为:
for row in test:
csv_out.writerow(row.split(', ', 1))
It means you split each string in the tuple by first occurrence of ', '
. 这意味着您通过第一次出现
', '
拆分元组中的每个字符串。 It produces two elements for each row and it is what csv writer is need is. 它为每一行产生两个元素,这是csv writer所需要的。
If you need Pandas
solution, use DataFrame constructor
and to_csv
: 如果您需要
Pandas
解决方案,请使用DataFrame constructor
和to_csv
:
import pandas as pd
df = pd.DataFrame([ x.split(',') for x in test ])
df.columns = ["id","content"]
print df
# id content
#0 opinion_1.txt I am an amateur photographer and own three DS...
#1 opinion_2.txt This my second Sony Digital Came.... good cam...
#2 opinion_3.txt 'I ordered this camera with high hopes after ...
#for testing
#print df.to_csv(index=False)
df.to_csv("/Users/user/Downloads/output.csv", index=False)
#id,content
#opinion_1.txt, I am an amateur photographer and own three DSLR c.... purchase
#opinion_2.txt, This my second Sony Digital Came.... good camera for a good price!
#opinion_3.txt, 'I ordered this camera with hig
If there is multiple ,
, you can use split
by first occurence of ,
: 如果有多个
,
你可以使用split
由第一次出现,
:
import pandas as pd
test = ("opinion_1.txt,a","opinion_2.txt,b","opinion_3.txt,c", "opinion_3.txt,b,c,k")
print test
print [ x.split(',', 1) for x in test ]
[['opinion_1.txt', 'a'],
['opinion_2.txt', 'b'],
['opinion_3.txt', 'c'],
['opinion_3.txt', 'b,c,k']]
df = pd.DataFrame([ x.split(',', 1) for x in test ])
df.columns = ["id","content"]
print df
id content
0 opinion_1.txt a
1 opinion_2.txt b
2 opinion_3.txt c
3 opinion_3.txt b,c,k
print df.to_csv(index=False)
id,content
opinion_1.txt,a
opinion_2.txt,b
opinion_3.txt,c
opinion_3.txt,"b,c,k"
Your parsing is destroyed if one of your sentences has multiple commas like this: 如果您的其中一个句子具有多个逗号,则您的分析将被破坏:
s = "opinion_4.txt, Oh my, what happens with really, really long sentences?"
>>> s.split(", ")
['opinion_4.txt',
'Oh my',
'what happens with really',
'really long sentences?']
A better approach would be to find the first comma and then split the sentence using slicing at this location: 更好的方法是找到第一个逗号,然后在该位置使用切片将句子拆分:
for line in text:
comma_idx = line.find(', ')
csvout.writerow(line[:comma_idx], line[comma_idx+2:])
For the sentence above, it would result in this: 对于上面的句子,这将导致以下结果:
('opinion_4.txt', 'Oh my, what happens with really, really long sentences?')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.