简体   繁体   English

从另一个 csv 文件添加列

[英]Add column from another csv file

I have some csv files, lets say now I have 3 files in a single folder with three columns each file.我有一些 csv 个文件,假设现在我在一个文件夹中有 3 个文件,每个文件三列。

1.csv                2.csv                3.csv

A    B    C        A    B    C        A    B    C

5   23    56       5    43   23       5    65   08
10  31    77       10   76   66       10   34   72
20  33    98       20   39   28       20   23   64
30  18    26       30   27   39       30   73   92

I want to make a new csv file with A column and add only B columns from another csv files by looping, like below:我想用A列创建一个新的 csv 文件,并通过循环仅添加另一个 csv 文件中的B列,如下所示:

desired result:期望的结果:

new.csv

A    B     B    B
5    23    43   65
10   31    76   34
20   33    39   23
30   18    27   73

but I have failed.但我失败了。

This is my current code:这是我当前的代码:

import pandas as pd
import numpy as np
import csv
import glob
import os 

path = "C:/Users/SYIFAAZRA/Documents/belajar_wradlib/learning/" 
os.chdir(path) 
file = glob.glob("*.csv") 
one = { 'A' : ['5','10','20','30'] } 
i = 1 
for f in file: 
  i = i+1 
  col_names = ['B', 'C'] 
  df = pd.read_csv(f, delimiter=',',usecols=[1, 2], names=col_names) 
  df = pd.DataFrame(one) 
  df['B'] = pd.Series(df) 
  print(df)

You're going to want to merge your dataframes on the key 'A', since it exists in all of your files.您将要在键“A”上合并您的数据框,因为它存在于您的所有文件中。 I recommend moving the creation of your df before the loop.我建议在循环之前移动 df 的创建。

df = pd.DataFrame(one) 
for f in file: 
  i = i+1 
  col_names = ['B', 'C'] 
  df_dummy = pd.read_csv(f, delimiter=',',usecols=[1, 2], names=col_names) 
  df.merge(df_dummy['B'],left_on='A',right_on='A',suffixes=('_left','_right'))

Note that you may need to clean up the names of your columns, depending on what you ultimately intend to do.请注意,您可能需要清理列的名称,具体取决于您最终打算执行的操作。

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge

Leaving out the reading of csv files as it is not related to the question and it's easier to have a complete minimal example:省略了 csv 文件的读取,因为它与问题无关,并且更容易获得完整的最小示例:

csv1=pd.DataFrame(np.array([[5,23,56],[10,31,77],[20,33,98]]), columns=['a','b','c'])
csv2=pd.DataFrame(np.array([[5,43,23],[10,76,66],[20,39,28]]), columns=['a','b','c'])
csv3=pd.DataFrame(np.array([[5,65,8],[10,34,72],[20,23,64]]), columns=['a','b','c'])
 
df1= csv1.iloc[:,:2]
df1['b1']=csv2.iloc[:,1]
df1['b2']=csv3.iloc[:,1]
df1

在此处输入图像描述

Your second question below is about many files.下面的第二个问题是关于许多文件的。 If the number of files is not massive I would split the operation into two loops.如果文件数量不是很大,我会将操作分成两个循环。 One where you read the files into a list of dataframes, another where you aggregate them into one dataframe.一个是将文件读入数据帧列表,另一个是将它们聚合到一个 dataframe 中。

 path = "C:/Users/SYIFAAZRA/Documents/belajar_wradlib/learning/" 
 os.chdir(path) 
 files = glob.glob("*.csv") 
 aa=[]
 for f  in files:
     aa.append(  pd.read_csv(f)) 

 df = aa[0].iloc[:,:2]  # makes first two columns AB

 for i,a in enumerate(aa[1:]):        # go through the remaining dataframes
     df[str(i)] = a.iloc[:,1]    # name the remaining columns b1,b2,b3...
     

All of this is not very elegant, but I have trouble remembering the elegant solutions in pandas .所有这些都不是很优雅,但我很难记住pandas中的优雅解决方案。 I prefer simple to read and understand.我更喜欢简单易读。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM