[英]I need to split one column in csv file into two columns using python
Hello everyone I am learning python I am new I have a column in a csv file with this example of value:大家好,我正在学习 python 我是新手 我在 csv 文件中有一个列,其中包含以下值示例:
I want to divide the column programme based on that semi column into two columns for example例如,我想将基于该半列的列程序分为两列
program 1: H2020-EU.3.1.
program 2: H2020-EU.3.1.7.
This is what I wrote initially这是我最初写的
import csv
import os
with open('IMI.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
with open('new_IMI.csv', 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter='\t')
#for line in csv_reader:
# csv_writer.writerow(line)
please note that after i do the split of columns I need to write the file again as a csv and save it to my computer请注意,在我进行列拆分后,我需要再次将文件写入 csv 并将其保存到我的计算机
Please guide me请指导我
Using .loc
to iterate through each row of a dataframe is somewhat inefficient.使用
.loc
遍历 dataframe 的每一行效率有点低。 Better to split an entire column, with the expand=True
to assign to the new columns.最好拆分整个列,将
expand=True
分配给新列。 Also as stated, easy to use pandas
here:也如上所述,易于使用
pandas
在这里:
Code:代码:
import pandas as pd
df = pd.read_csv('IMI.csv')
df[['programme1','programme2']] = df['programme'].str.split(';', expand=True)
df.drop(['programme'], axis=1, inplace=True)
df.to_csv('IMI.csv', index=False)
Example of output: output 示例:
Before:前:
print(df)
id acronym status programme topics
0 945358 BIGPICTURE SIGNED H2020-EU.3.1.;H2020-EU3.1.7 IMI2-2019-18-01
1 821362 EBiSC2 SIGNED H2020-EU.3.1.;H2020-EU3.1.7 IMI2-2017-13-06
2 116026 HARMONY SIGNED H202-EU.3.1. IMI2-2015-06-04
After:后:
print(df)
id acronym status topics programme1 programme2
0 945358 BIGPICTURE SIGNED IMI2-2019-18-01 H2020-EU.3.1. H2020-EU3.1.7
1 821362 EBiSC2 SIGNED IMI2-2017-13-06 H2020-EU.3.1. H2020-EU3.1.7
2 116026 HARMONY SIGNED IMI2-2015-06-04 H2020-EU.3.1. None
You can use pandas
library instead of csv
.您可以使用
pandas
库代替csv
。
import pandas as pd
df = pd.read_csv('IMI.csv')
p1 = {}
p2 = {}
for i in range(len(df)):
if ';' in df['programme'].loc[i]:
p1[df['id'].loc[i]] = df['programme'].loc[i].split(';')[0]
p2[df['id'].loc[i]] = df['programme'].loc[i].split(';')[1]
df['programme1'] = df['id'].map(p1)
df['programme2'] = df['id'].map(p2)
and if you want to delete programme
column:如果要删除
programme
列:
df.drop('programme', axis=1)
To save new csv file:要保存新的 csv 文件:
df.to_csv('new_file.csv', inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.