[英]I need to read a csv with unknown number of columns and then write the data to a csv with a set number of columns
所以我有一个看起来像这样的文件:
name,number,email,job1,job2,job3,job4
我需要将其转换为如下所示:
name,number,email,job1
name,number,email,job2
name,number,email,job3
name,number,email,job4
我将如何在 Python 中执行此操作?
正如评论中所说,您可以使用 pandas 来读取、写入和操作 csv 文件。
这是一个示例,说明如何使用 python 中的 pandas 解决问题
import pandas as pd
# df = pd.read_csv("filename.csv") # read csv file from disk
# comment out below line when open from disk
df = pd.DataFrame([['ss','0152','ss@','student','others']],columns=['name','number','email','job1','job2'])
print(df)
这行 output 是
name number email job1 job2
0 ss 0152 ss@ student others
现在我们需要知道有多少列:
x = len(df.columns)
print(x)
它将存储 x 中的列数
5
现在让我们创建一个空的Dataframe
columns= [name,number,email,job]
c = pd.DataFrame(columns=['name','number','email','job'])
print(c)
output:
Columns: [name, number, email, job]
Index: []
现在我们使用从范围 3 到列末尾的循环,并将 datafarme 与我们的空 dataframe 连接起来:
for i in range(3,x):
df1 = df.iloc[:,0:3].copy() # we took first 3 column
df2 = df.iloc[:,[i]].copy() # we took ith coulmn
df1['job'] = df2; # added ith coulmn to the df1
c = pd.concat([df1,c]); # concat df1 and c
print(c)
output:
name number email job
0 ss 0152 ss@ others
0 ss 0152 ss@ student
Dataframe c 有您想要的 output。 现在您可以使用
c.to_csv('ouput.csv')
以下:
with open('input.csv') as f_in:
lines = [l.strip() for l in f_in.readlines()]
with open('output.csv','w') as f_out:
for idx,line in enumerate(lines):
if idx > 0:
fields = line.split(',')
for idx in range(3,len(fields)):
f_out.write(','.join(fields[:3]) + ',' + fields[idx] + '\n')
输入。csv
header row
name,number,email,job1,job2,job3,job4
name1,number1,email1,job11,job21,job31,job41
output.csv
name,number,email,job1
name,number,email,job2
name,number,email,job3
name,number,email,job4
name1,number1,email1,job11
name1,number1,email1,job21
name1,number1,email1,job31
name1,number1,email1,job41
假设这是 dataframe:
import pandas as pd
df = pd.DataFrame(columns=['name','number','email','job1','job2','job3','job4'])
df = df.append({'name':'jon', 'number':123, 'email':'smth@smth.smth', 'job1':'a','job2':'b','job3':'c','job4':'d'},ignore_index=True)
我们定义一个新的 dataframe:
new_df = pd.DataFrame(columns=['name','number','email','job'])
现在,我们遍历旧的以根据作业对其进行拆分。 我假设您有 4 个工作要拆分:
for i, row in df.iterrows():
for job in range(1,5):
job_col = "job" + str(job)
new_df = new_df.append({'name':row['name'], 'number':row['number'], 'email':row['email'], 'job':row[job_col]}, ignore_index=True)
您可以使用csv模块和 Python 的解包语法从输入文件中获取数据并将其写入 output 文件。
import csv
with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
# Skip header row, if necessary
next(reader)
# Use sequence unpacking to get the fixed variables and
# and arbitrary number of "jobs".
for name, number, email, *jobs in reader:
for job in jobs:
writer.writerow([name, number, email, job])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.