[英]How can I merge multiple csv files with Python as I want?
I have a few csv files for my homework.我有一些 csv 文件用于我的作业。 I want to combine them as in the example below.
我想将它们组合起来,如下例所示。 But I didn't know how to do it.
但我不知道该怎么做。
Exp1.csv exp1.csv
"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
"25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"
Exp2.csv exp2.csv
"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
"25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"
I want to merge 2 files like this.我想像这样合并 2 个文件。 I just want to get the VOL% column.
我只想得到 VOL% 列。
"DATE","Exp1","Exp2"
"01.09.2019","-7,04%","0,73%"
"25.08.2019","4,07%","1,98%"
Thank you everyone:) I found a solution like this and applied it.谢谢大家:)我找到了这样的解决方案并应用了它。
import glob
import os
import pandas.io
path =r'/Users/baris/Documents/Files/'
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pandas.read_csv(f) for f in all_files)
concatenated_df = pandas.concat(df_from_each_file, axis=1)
concatenated_df_clean = (concatenated_df.drop('DATE',1).drop('NOW',1).drop('OPEN',1).drop('HIGH.',1).drop('Low',1).drop('Hac.',1)
df_dates_file = pandas.read_csv('/Users/baris/Documents/Files/Exp1.csv')
df_date_export = concatenated_df.iloc[:, 0]
final_result = pandas.concat([df_date_export,concatenated_df_clean], axis=1)
print(final_result)
import csv
with open('Exp1.csv', 'r') as f1:
csv_reader = csv.reader(f1, delimiter=',')
lines1 = [row for row in csv_reader]
with open('Exp2.csv', 'r') as f2:
csv_reader = csv.reader(f2, delimiter=',')
lines2 = [row for row in csv_reader]
del lines1[0]
del lines2[0]
with open('output.csv', 'w+') as output_file:
output_file.write('"DATE","Exp1","Exp2"\n')
for index, _ in enumerate(lines1):
date = lines1[index][0]
vol1 = lines1[index][6]
vol2 = lines2[index][6]
output_file.write(f'"{date}","{vol1}","{vol2}"\n')
This assumes the following:这假设如下:
VOL %
is always going to be in the 7th column (like in your example) VOL %
总是会在第 7 列(就像在你的例子中)DATE
is always going to be in the 1st column (like in your example) DATE
总是在第一列(就像你的例子一样)Exp1.csv
and Exp2.csv
Exp1.csv
和Exp2.csv
的行数总是相同的"DATE"
will always be the same in Exp1.csv
and Exp2.csv
Exp1.csv
和Exp2.csv
的"DATE"
将始终相同Read more about the CSV module: https://docs.python.org/3/library/csv.html阅读有关 CSV 模块的更多信息: https : //docs.python.org/3/library/csv.html
You can use the pandas package to read and save csv.您可以使用 pandas 包来读取和保存 csv。 However you cannot remove columns while merging the csv files, but you can save the columns that you need Have a look at my code below.
但是,在合并 csv 文件时无法删除列,但可以保存所需的列,请查看下面的代码。 Replace the csv filename and column name with yours.
将 csv 文件名和列名替换为您的。
import pandas as pd
# create list of files you want to merge
all_filenames = ['test.csv','test1.csv']
# use pandas concat function to merge csv's
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
# export the csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig',columns=['test1'])
try something like this:尝试这样的事情:
df = pd.read_csv('Exp1.csv')
df1 = pd.read_csv('Exp2.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df1['DATE'] = pd.to_datetime(df['DATE'])
final_df = pd.merge(df[['DATE', 'VOL %']], df1[['DATE', 'VOL %']], on='DATE')
print(final_df)
DATE VOL %_x VOL %_y
2019-01-09 -7,04% 0,73%
2019-08-25 4,07% 1,98%
Use the csv module.使用 csv 模块。
https://docs.python.org/3/library/csv.html https://docs.python.org/3/library/csv.html
read this tutorial:阅读本教程:
https://realpython.com/python-csv/ https://realpython.com/python-csv/
something like this will do the work: (educational code)像这样的事情将完成工作:(教育代码)
import io
import csv
target = {}
file_one_string =\
""""DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
"25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"
"""
file_two_string = \
""""DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
"25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"
"""
with io.StringIO(file_one_string) as file_one:
csv_reader = csv.DictReader(file_one,delimiter=',',quotechar='"')
for row in csv_reader:
if 'VOL %' in row:
target[row['DATE']] ={'Exp1': row['VOL %']}
with io.StringIO(file_two_string) as file_two:
csv_reader = csv.DictReader(file_two,dialect="excel")
for row in csv_reader:
if row['DATE'] in target:
target[row['DATE']]['Exp2'] = row['VOL %']
else:
print('Missing DATE {} in file_two'.format(row['DATE']))
lines2 = [row for row in csv_reader]
with io.StringIO() as output_file:
fieldnames = ['DATE', 'Exp1', 'Exp2']
csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
csv_writer.writeheader()
for key, value in target.items():
csv_writer.writerow({
'DATE': key,
'Exp1': value['Exp1'],
'Exp2': value['Exp2']
})
print(output_file.getvalue())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.