[英]Add column from another csv file
I have some csv files, lets say now I have 3 files in a single folder with three columns each file.我有一些 csv 个文件,假设现在我在一个文件夹中有 3 个文件,每个文件三列。
1.csv 2.csv 3.csv
A B C A B C A B C
5 23 56 5 43 23 5 65 08
10 31 77 10 76 66 10 34 72
20 33 98 20 39 28 20 23 64
30 18 26 30 27 39 30 73 92
I want to make a new csv file with A
column and add only B
columns from another csv files by looping, like below:我想用
A
列创建一个新的 csv 文件,并通过循环仅添加另一个 csv 文件中的B
列,如下所示:
desired result:期望的结果:
new.csv
A B B B
5 23 43 65
10 31 76 34
20 33 39 23
30 18 27 73
but I have failed.但我失败了。
This is my current code:这是我当前的代码:
import pandas as pd
import numpy as np
import csv
import glob
import os
path = "C:/Users/SYIFAAZRA/Documents/belajar_wradlib/learning/"
os.chdir(path)
file = glob.glob("*.csv")
one = { 'A' : ['5','10','20','30'] }
i = 1
for f in file:
i = i+1
col_names = ['B', 'C']
df = pd.read_csv(f, delimiter=',',usecols=[1, 2], names=col_names)
df = pd.DataFrame(one)
df['B'] = pd.Series(df)
print(df)
You're going to want to merge your dataframes on the key 'A', since it exists in all of your files.您将要在键“A”上合并您的数据框,因为它存在于您的所有文件中。 I recommend moving the creation of your df before the loop.
我建议在循环之前移动 df 的创建。
df = pd.DataFrame(one)
for f in file:
i = i+1
col_names = ['B', 'C']
df_dummy = pd.read_csv(f, delimiter=',',usecols=[1, 2], names=col_names)
df.merge(df_dummy['B'],left_on='A',right_on='A',suffixes=('_left','_right'))
Note that you may need to clean up the names of your columns, depending on what you ultimately intend to do.请注意,您可能需要清理列的名称,具体取决于您最终打算执行的操作。
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge
Leaving out the reading of csv files as it is not related to the question and it's easier to have a complete minimal example:省略了 csv 文件的读取,因为它与问题无关,并且更容易获得完整的最小示例:
csv1=pd.DataFrame(np.array([[5,23,56],[10,31,77],[20,33,98]]), columns=['a','b','c'])
csv2=pd.DataFrame(np.array([[5,43,23],[10,76,66],[20,39,28]]), columns=['a','b','c'])
csv3=pd.DataFrame(np.array([[5,65,8],[10,34,72],[20,23,64]]), columns=['a','b','c'])
df1= csv1.iloc[:,:2]
df1['b1']=csv2.iloc[:,1]
df1['b2']=csv3.iloc[:,1]
df1
Your second question below is about many files.下面的第二个问题是关于许多文件的。 If the number of files is not massive I would split the operation into two loops.
如果文件数量不是很大,我会将操作分成两个循环。 One where you read the files into a list of dataframes, another where you aggregate them into one dataframe.
一个是将文件读入数据帧列表,另一个是将它们聚合到一个 dataframe 中。
path = "C:/Users/SYIFAAZRA/Documents/belajar_wradlib/learning/"
os.chdir(path)
files = glob.glob("*.csv")
aa=[]
for f in files:
aa.append( pd.read_csv(f))
df = aa[0].iloc[:,:2] # makes first two columns AB
for i,a in enumerate(aa[1:]): # go through the remaining dataframes
df[str(i)] = a.iloc[:,1] # name the remaining columns b1,b2,b3...
All of this is not very elegant, but I have trouble remembering the elegant solutions in pandas
.所有这些都不是很优雅,但我很难记住
pandas
中的优雅解决方案。 I prefer simple to read and understand.我更喜欢简单易读。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.