将一列 csv 文件合并为一个 csv 文件

Question

我在这里看到了这类问题的一些答案，但还不足以真正帮助我。 我对一个 9 列的 .csv 文件进行了拆分，并将它们写入向量中，以便在 C++ 中进行其他工作。 随后，它们将作为单列 .csv 文件写回到文件夹中，这些文件基本上类似于以下内容：

现在我想再次将所有 9 个简单的 csv 文件合并为 1 个文件，以便它们彼此水平堆叠，就像在新文件中这样：

date,value,etc...     
20171012,2501593,etc..
20171011,2176309,etc..
20171010,3484064,etc..
20171009,1785852,etc..
20171006,1785852,etc..
20171005,16476641,etc..
20171004,1235406,etc..

我希望这很容易理解。 我的代码如下：

import csv
data = [] # Buffer list
files = ['./CalculatedOutput/quote_date.csv', './CalculatedOutput/paper.csv', './CalculatedOutput/exch.csv', './CalculatedOutput/open.csv', './CalculatedOutput/high.csv', './CalculatedOutput/low.csv', './CalculatedOutput/close.csv', './CalculatedOutput/volume.csv', './CalculatedOutput/value.csv']

for filename in files:
    with open(filename, 'r') as csvfile:
        stocks = csv.reader(csvfile)
        for row in stocks:
            new_row = [row[0]]
            data.append(new_row)
        with open("CalculatedOutput/Opera.csv", "w+") as to_file:
            writer = csv.writer(to_file , delimiter=",")
            for new_row in data:
                writer.writerow(new_row)

此代码确实将列的所有行移动到 1 个新文件中，但它只是将它们放在一起。 我将如何编写彼此相邻的列，逗号分隔？ 我根据 concat、merge 和其他方法对 Pandas、numpy 和 csv lib 进行了广泛的尝试，但我找不到正确的方法。 我不认为我离得那么远，但不幸的是我的蟒蛇并不是最好的！

Answer 1

您可以使用一个带有contextlib.ExitStack上下文管理器来打开所有文件（在Python 3中），然后在文件的可迭代项上应用zip后写入输出文件：

import csv
from contextlib import ExitStack

outfile = "CalculatedOutput/Opera.csv"
with ExitStack() as stack, open(outfile, "w+") as to_file:
    # open all files
    fs = [stack.enter_context(open(fname)) for fname in files]
    fs = map(csv.reader, fs)
    # write all rows from all files
    csv.writer(to_file).writerows(zip(*fs))

更新：

如果文件中包含无法解码为UTF-8的字符（ open的默认编码），则可以在读取时使用中间替代字符，这些中间替代字符在写入时将替换为其原始格式：

with ExitStack() as stack, open(outfile, "w+", errors='surrogateescape') as to_file :
    fs = [stack.enter_context(open(fname, errors='surrogateescape')) for fname in files]
    ...

Answer 2

我看过您尝试过的熊猫，那里出了什么问题？ 使用熊猫，我们可以简单地使用pd.concat（[df1，df2 ....]）。 因此，让我们阅读它们并将它们捆在一起：

import pandas as pd

df = pd.concat((pd.read_csv(f) for f in files),axis=1) # axis1 for horizontal
df.to_csv("CalculatedOutput/Opera.csv",index=False)

例：

首先创建两个虚构文件：

file1 = """date
20171012
20171011
20171010
20171009
20171006
20171005
20171004"""

file2 = """number
1
2
3
4
5
6
7"""

files = [io.StringIO(f) for f in [file1,file2]]

import pandas as pd

df = pd.concat([pd.read_csv(f) for f in files],axis=1)

print(df)

       date  number
0  20171012       1
1  20171011       2
2  20171010       3
3  20171009       4
4  20171006       5
5  20171005       6
6  20171004       7

Answer 3

使用 os 查找以 input* 结尾的目录中的所有文件，然后使用 csv 中的列名自动使用 pd.concat 按行堆叠数据。 如果不是 unicode，则假定编码为 iso-8859-1。

  path= 'C:\\Users\\your_username\\stacked_csv' 

  # get all csv files in input directory
  csv_files = glob.glob(os.path.join(path, 'input*.csv'))

  # read all csv files
  df_list = []
  for csv_file in csv_files:
      df = pd.read_csv(csv_file,encoding='iso-8859-1')
      df_list.append(df.dropna())

  # stack all csv files
  df_stacked = pd.concat(df_list, axis=0)

  output_file=path+"\\result_all.csv"
  # write stacked csv file
  df_stacked.to_csv(output_file, index=False)

  print(pd.read_csv(output_file))

将一列 csv 文件合并为一个 csv 文件

问题描述

3 个解决方案

解决方案1
3 2017-10-18 13:12:54

解决方案2
1 已采纳 2017-10-18 13:25:20

解决方案3
0 2021-09-08 15:07:01

将一列 csv 文件合并为一个 csv 文件

问题描述

3 个解决方案

解决方案1 3 2017-10-18 13:12:54

解决方案2 1 已采纳 2017-10-18 13:25:20

解决方案3 0 2021-09-08 15:07:01

解决方案1
3 2017-10-18 13:12:54

解决方案2
1 已采纳 2017-10-18 13:25:20

解决方案3
0 2021-09-08 15:07:01