简体   繁体   English

如何根据需要将多个 csv 文件与 Python 合并?

[英]How can I merge multiple csv files with Python as I want?

I have a few csv files for my homework.我有一些 csv 文件用于我的作业。 I want to combine them as in the example below.我想将它们组合起来,如下例所示。 But I didn't know how to do it.但我不知道该怎么做。

Exp1.csv exp1.csv

"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
"25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"

Exp2.csv exp2.csv

"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
"25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"

I want to merge 2 files like this.我想像这样合并 2 个文件。 I just want to get the VOL% column.我只想得到 VOL% 列。


"DATE","Exp1","Exp2"
"01.09.2019","-7,04%","0,73%"
"25.08.2019","4,07%","1,98%"

Thank you everyone:) I found a solution like this and applied it.谢谢大家:)我找到了这样的解决方案并应用了它。

import glob
import os
import pandas.io

path =r'/Users/baris/Documents/Files/'
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pandas.read_csv(f) for f in all_files)
concatenated_df = pandas.concat(df_from_each_file, axis=1)
concatenated_df_clean = (concatenated_df.drop('DATE',1).drop('NOW',1).drop('OPEN',1).drop('HIGH.',1).drop('Low',1).drop('Hac.',1)

df_dates_file = pandas.read_csv('/Users/baris/Documents/Files/Exp1.csv')
df_date_export = concatenated_df.iloc[:, 0]

final_result = pandas.concat([df_date_export,concatenated_df_clean], axis=1)
print(final_result)


import csv

with open('Exp1.csv', 'r') as f1:
    csv_reader = csv.reader(f1, delimiter=',')
    lines1 = [row for row in csv_reader]

with open('Exp2.csv', 'r') as f2:
    csv_reader = csv.reader(f2, delimiter=',')
    lines2 = [row for row in csv_reader]

del lines1[0]
del lines2[0]
with open('output.csv', 'w+') as output_file:
    output_file.write('"DATE","Exp1","Exp2"\n')
    for index, _ in enumerate(lines1):
        date = lines1[index][0]
        vol1 = lines1[index][6]
        vol2 = lines2[index][6]
        output_file.write(f'"{date}","{vol1}","{vol2}"\n')

This assumes the following:这假设如下:

  • VOL % is always going to be in the 7th column (like in your example) VOL %总是会在第 7 列(就像在你的例子中)
  • DATE is always going to be in the 1st column (like in your example) DATE总是在第一列(就像你的例子一样)
  • there will always be the same number of lines in Exp1.csv and Exp2.csv Exp1.csvExp2.csv的行数总是相同的
  • the "DATE" will always be the same in Exp1.csv and Exp2.csv Exp1.csvExp2.csv"DATE"将始终相同

Read more about the CSV module: https://docs.python.org/3/library/csv.html阅读有关 CSV 模块的更多信息: https : //docs.python.org/3/library/csv.html

You can use the pandas package to read and save csv.您可以使用 pandas 包来读取和保存 csv。 However you cannot remove columns while merging the csv files, but you can save the columns that you need Have a look at my code below.但是,在合并 csv 文件时无法删除列,但可以保存所需的列,请查看下面的代码。 Replace the csv filename and column name with yours.将 csv 文件名和列名替换为您的。

import pandas as pd

# create list of files you want to merge
all_filenames = ['test.csv','test1.csv']

# use pandas concat function to merge csv's
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])

# export the csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig',columns=['test1'])

try something like this:尝试这样的事情:

df = pd.read_csv('Exp1.csv')

df1 = pd.read_csv('Exp2.csv')

df['DATE'] = pd.to_datetime(df['DATE'])
df1['DATE'] = pd.to_datetime(df['DATE'])

final_df = pd.merge(df[['DATE', 'VOL %']], df1[['DATE', 'VOL %']], on='DATE')

print(final_df)
      DATE VOL %_x VOL %_y
2019-01-09  -7,04%   0,73%
2019-08-25   4,07%   1,98%

Use the csv module.使用 csv 模块。

https://docs.python.org/3/library/csv.html https://docs.python.org/3/library/csv.html

read this tutorial:阅读本教程:

https://realpython.com/python-csv/ https://realpython.com/python-csv/

something like this will do the work: (educational code)像这样的事情将完成工作:(教育代码)

import io
import csv

target = {}

file_one_string =\
""""DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
"25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"
"""
file_two_string = \
""""DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
"25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"
"""


with io.StringIO(file_one_string) as file_one:
    csv_reader = csv.DictReader(file_one,delimiter=',',quotechar='"')
    for row in csv_reader:
        if 'VOL %' in row:
            target[row['DATE']] ={'Exp1': row['VOL %']}

with io.StringIO(file_two_string) as file_two:
    csv_reader = csv.DictReader(file_two,dialect="excel")
    for row in csv_reader:
        if row['DATE'] in target:
            target[row['DATE']]['Exp2'] = row['VOL %']
        else:
            print('Missing DATE {} in file_two'.format(row['DATE']))
    lines2 = [row for row in csv_reader]


with io.StringIO() as output_file:
    fieldnames = ['DATE', 'Exp1', 'Exp2']
    csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
    csv_writer.writeheader()
    for key, value in target.items():
        csv_writer.writerow({
            'DATE': key,
            'Exp1': value['Exp1'],
            'Exp2': value['Exp2']
        })

    print(output_file.getvalue())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM