繁体   English   中英

我需要读取列数未知的 csv,然后将数据写入具有设定列数的 csv

[英]I need to read a csv with unknown number of columns and then write the data to a csv with a set number of columns

所以我有一个看起来像这样的文件:

name,number,email,job1,job2,job3,job4

我需要将其转换为如下所示:

name,number,email,job1
name,number,email,job2
name,number,email,job3
name,number,email,job4

我将如何在 Python 中执行此操作?

正如评论中所说,您可以使用 pandas 来读取、写入和操作 csv 文件。

这是一个示例,说明如何使用 python 中的 pandas 解决问题

import pandas as pd


# df = pd.read_csv("filename.csv")  # read csv file from disk

# comment out below line when open from disk
df = pd.DataFrame([['ss','0152','ss@','student','others']],columns=['name','number','email','job1','job2'])

print(df)

这行 output 是

  name number email     job1    job2
0   ss   0152   ss@  student  others

现在我们需要知道有多少列:

x = len(df.columns)
print(x)

它将存储 x 中的列数

5

现在让我们创建一个空的Dataframe columns= [name,number,email,job]

c = pd.DataFrame(columns=['name','number','email','job'])
print(c)

output:

Columns: [name, number, email, job]
Index: []

现在我们使用从范围 3 到列末尾的循环,并将 datafarme 与我们的空 dataframe 连接起来:

for i in range(3,x):
  df1 = df.iloc[:,0:3].copy() # we took first 3 column
  df2 = df.iloc[:,[i]].copy() # we took ith coulmn
  df1['job'] = df2; # added ith coulmn to the df1
  c = pd.concat([df1,c]); # concat df1 and c
print(c)

output:

  name number email      job
0   ss   0152   ss@   others
0   ss   0152   ss@  student

Dataframe c 有您想要的 output。 现在您可以使用

c.to_csv('ouput.csv')

以下:

with open('input.csv') as f_in:
  lines = [l.strip() for l in f_in.readlines()]
  with open('output.csv','w') as f_out:
    for idx,line in enumerate(lines):
      if idx > 0:
        fields = line.split(',')
        for idx in range(3,len(fields)):
          f_out.write(','.join(fields[:3]) + ',' + fields[idx] + '\n')

输入。csv

header row
name,number,email,job1,job2,job3,job4
name1,number1,email1,job11,job21,job31,job41

output.csv

name,number,email,job1
name,number,email,job2
name,number,email,job3
name,number,email,job4
name1,number1,email1,job11
name1,number1,email1,job21
name1,number1,email1,job31
name1,number1,email1,job41

假设这是 dataframe:

import pandas as pd

df = pd.DataFrame(columns=['name','number','email','job1','job2','job3','job4'])
df = df.append({'name':'jon', 'number':123, 'email':'smth@smth.smth', 'job1':'a','job2':'b','job3':'c','job4':'d'},ignore_index=True)

我们定义一个新的 dataframe:

new_df = pd.DataFrame(columns=['name','number','email','job'])

现在,我们遍历旧的以根据作业对其进行拆分。 我假设您有 4 个工作要拆分:

for i, row in df.iterrows():
    for job in range(1,5):
        job_col = "job" + str(job)
        new_df = new_df.append({'name':row['name'], 'number':row['number'], 'email':row['email'], 'job':row[job_col]}, ignore_index=True)

您可以使用csv模块和 Python 的解包语法从输入文件中获取数据并将其写入 output 文件。

import csv

with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    # Skip header row, if necessary
    next(reader)
    # Use sequence unpacking to get the fixed variables and
    # and arbitrary number of "jobs".
    for name, number, email, *jobs in reader:
        for job in jobs:
            writer.writerow([name, number, email, job])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM