简体   繁体   English

使用Python将两行合并为一个csv文件

[英]Combine two rows into one in a csv file with Python

I am trying to combine multiple rows in a csv file together. 我正在尝试将csv文件中的多行组合在一起。 I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. 我可以在Excel中轻松完成此操作,但是我想对数百个文件执行此操作,因此我需要将其作为代码。 I have tried to store rows in arrays but it doesn't seem to work. 我试图将行存储在数组中,但似乎不起作用。 I am using Python to do it. 我正在用Python做到这一点。

So lets say I have a csv file; 假设我有一个csv文件;

1,2,3
4,5,6
7,8,9

All I want to do is to have a csv file as this; 我要做的就是拥有一个csv文件;

1,2,3,4,5,6,7,8,9

The code I have tried is this; 我尝试过的代码是这样;

fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
  new = line.replace(',', ' ', 1)
  fout.write (new)
fin.close()
fout.close()

Could you please help? 能否请你帮忙?

You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). 您应该为此使用csv模块,因为在逗号上手动拆分CSV非常容易出错(单列可以包含带逗号的字符串,但最终会错误地将其拆分为多列)。 The CSV module uses lists of values to represent single rows. CSV模块使用值列表来表示单行。

import csv

def return_contents(file_name):
    with open(file_name) as infile:
        reader = csv.reader(infile)
        return list(reader)

data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')

print(data1)
print(data2)

combined = []
for row in data1:
    combined.extend(row)

for row in data2:
    combined.extend(row)

with open('csv_out.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(combined)

That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. 该代码为您提供了该方法的基础,但是将其扩展到数百个文件将是很丑陋的。 Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. 取而代之的是,您可能希望os.listdir将所有文件拉到一个目录中,并将它们添加到输出中。 This is the reason that I packed the reading code into the return_contents function; 这就是我将阅读代码打包到return_contents函数中的原因。 we can repeat the same process millions of times on different files with only one set of code to do the actual reading. 我们只需使用一组代码即可对不同的文件重复相同的过程数百万次,以进行实际的读取。 Something like this: 像这样:

import csv
import os


def return_contents(file_name):
    with open(file_name) as infile:
        reader = csv.reader(infile)
        return list(reader)

all_files = os.listdir('my_csvs')

combined_output = []

for file in all_files:
    data = return_contents('my_csvs/{}'.format(file))
    for row in data:
        combined_output.extend(row)

with open('csv_out.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(combined_output)

If you are specially dealing with csv file format. 如果您正在专门处理csv文件格式。 I recommend you to use csv package for the file operations. 我建议您使用csv软件包进行文件操作。 If you also use with ... as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files Here is what you can do: 如果您还使用with ... as语句,则无需担心关闭文件等。只需要定义PATH程序便会迭代所有.csv文件,这是您可以做的:

PATH = "your folder path"
def order_list():
      data_list = []
      for filename in os.listdir(PATH):
          if filename.endswith(".csv"):
              with open("data.csv") as csvfile:
                  read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
                  for row in read_csv:
                      data_list.extend(row)

  print(data_list)

if __name__ == '__main__':
    order_list()

Store your data in pandas df 将数据存储在pandas df中

import pandas as pd    
df = pd.read_csv('file.csv')

Store the modified dataframe into new one 将修改后的数据帧存储到新数据帧中

df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column

Write the df to new csv 将df写入新的csv

df2.to_csv("file_modified.csv")

You could do it also like this: 您也可以这样:

fIn = open("test.csv", "r")
fOut = open("output.csv", "w")

fOut.write(",".join([line for line in fIn]).replace("\n",""))

fIn.close()
fOut.close()

I've you want now to run it on multiple file you can run it as script with arguments: 我想现在要在多个文件上运行它,可以将其作为带有参数的脚本运行:

import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")

fOut.write(",".join([line for line in fIn]).replace("\n",""))

fIn.close()
fOut.close()

So now expect you use some Linux System and the script is called csvOnliner.py you could call it with: 因此,现在希望您使用一些Linux系统,并且脚本名为csvOnliner.py您可以使用以下命令进行调用:

for i in *.csv; do python csvOnliner.py $i changed_$i; done

With windows you could do it in a way like this: 使用Windows,您可以按照以下方式进行操作:

FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM