简体   繁体   English

使用python,如何在csv文件的每一行中拆分字符串

[英]With python, how to split string in each row of a csv file

This is how the csv file looks like这是 csv 文件的样子

I have this banking dataset with all the variable names and item staying in the same cells of the column A. How do I separate them properly by ";", and place them in each column of the csv file following column A, with Python?我有这个银行数据集,所有变量名和项目都位于 A 列的相同单元格中。如何用“;”将它们正确分隔,并将它们放在 A 列之后的 csv 文件的每一列中,使用 Python?

For example, all the variable names are stored in A1 :例如,所有变量名称都存储在 A1 中:

age;"job";"marital";"education";"default";"housing";"loan";"contact";"month";"day_of_week";"duration";"campaign";"pdays";"previous";"poutcome";"emp.var.rate";"cons.price.idx";"cons.conf.idx";"euribor3m";"nr.employed";"y"年龄;“工作”;“婚姻”;“教育”;“默认”;“住房”;“贷款”;“联系方式”;“月”;“星期几”;“持续时间”;“活动”;“pday”; “以前的”;“poutcome”;“emp.var.rate”;“cons.price.idx”;“cons.conf.idx”;“euribor3m”;“nr.已就业”;“y”

and one of the data in B1: 56;"housemaid";"married";"basic.4y";"no";"no";"no";"telephone";"may";"mon";261;1;999;0;"nonexistent";1.1;93.994;-36.4;4.857;5191;"no" B1中的数据之一:56;"housemaid";"married";"basic.4y";"no";"no";"no";"telephone";"may";"mon";261; 1;999;0;“不存在”;1.1;93.994;-36.4;4.857;5191;“不”

same with the data in A2, A3, A4......与A2,A3,A4中的数据相同......

instead I would like to figure out a way to separate all of them by ";"相反,我想找出一种方法来用“;”分隔它们。 and place them in separated cells B1, C1, D1..... so they look like:并将它们放在单独的单元格 B1、C1、D1..... 所以它们看起来像:

____A______B_____C______ ____A_B_____C______
1| 1| Age_ |____job____|年龄_ |____工作____| marital_|.....婚姻_|.....
2|__56_ | 2|__56_ | housemaid_|_married |..... ...... ......(I hope to do the same for all the rows)女佣_|_已婚|..... ...... ......(我希望对所有行都这样做)

I want to modify the file with Python, so with read.csv from pandas I can read/analyze the data with gridlines.我想用 Python 修改文件,所以使用 pandas 的 read.csv 我可以用网格线读取/分析数据。 I think I did something similar before with R.我想我以前用 R 做过类似的事情。

First of all you should try to do it yourself first and then ask a question with a code sample.首先,您应该先尝试自己做,然后用代码示例提问。

Second, please accept answers that solve your question.其次,请接受解决您问题的答案。 (As I see the previous one wasn't accepted) (如我所见,上一个未被接受)

Thirdly, here is my shot at the code.第三,这是我对代码的看法。

For example if you have your data set (I simplified mine, but its along the lines of your data):例如,如果你有你的数据集(我简化了我的,但它沿着你的数据线):

"cat";"dog";"moose"
"moose";"cat";"dog"

And here is the code:这是代码:

import csv

csv_rows = []

with open('animals.csv', 'rb') as csvfile:
    orig_csv = csv.reader(csvfile, delimiter=';')
    for row in orig_csv:
        csv_rows.append(row)

with open("animals_1.csv", "wb") as csvfile:
    w = csv.writer(csvfile, delimiter=",")
    for row in csv_rows:
        w.writerow(row)

Make sure you are using the correct delimiter for reading/writing the CSV!确保您使用正确的分隔符来读取/写入 CSV! As the data set that you have would look fine on my PC.因为您拥有的数据集在我的 PC 上看起来不错。

Edited: changed the sample code a bit.编辑:稍微更改了示例代码。

based on my understanding, the raw format of your data is like:根据我的理解,您数据的原始格式如下:

[root@ES01 ~]# cat /tmp/test.txt 
c1;c2;c3;c4;c5
v1;v2;v3;v4;v5

You want to change to你想改成

c1,c2,c3,c4,c5
v1,v2,v3,v4,v5

I think you can我想你可以

f=open('/tmp/test.txt')
for line in f.readlines():
    print line.replace(';',',')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM