如何在大熊猫中合并多个csv的列而不使用picking_x或_y，而是选择具有信息的列

Question

I am trying to merge two csv's without having to pick the value from _x or _y. 我正在尝试合并两个csv，而不必从_x或_y中选择值。

MetaData1
Sample_name   TITLE
Cody        Chicken Pox
Claudia     Chicken Pox
Alex        Chicken Pox
Steven      Chicken Pox
Mom         Chicken Pox
Dad     

MetaData2
Sample_name    TITLE       Geo_Loc    DESCRIPTION
Dad         Chicken Pox     Earth       people
Me          Chicken Pox     Earth       people
Roger       Chicken Pox     Earth       people
Ben         Chicken Pox     Earth       people

Merge together to look like this: 合并在一起看起来像这样：

Merged Metadata 
Sample_name    TITLE             Geo_Loc                 DESCRIPTION
Cody        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Claudia     Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Alex        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Steven      Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Mom         Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Dad         Chicken Pox     Earth                   people
Me          Chicken Pox     Earth                   people
Roger       Chicken Pox     Earth                   people
Ben         Chicken Pox     Earth                   people

The code I have so far is Below, 到目前为止，我的代码如下

#Merging two or more csv files using pandas 
#Duplicate line for more than one csv file 
File_one = panda.read_csv('/Users/c1carpenter/Desktop/Test.txt', sep='\t', header=0, dtype=str)
File_two = panda.read_csv('/Users/c1carpenter/Desktop/Test2.txt', sep='\t', header=0, dtype=str)
Merge_File = panda.merge(File_one, File_two, how='outer', on='Sample_name')

however if I have a hundred columns, of which 50 end up being duplicates.How do I merge them without losing the data. 但是，如果我有100列，其中50列最终是重复的。如何合并它们而不丢失数据。 and having to type out each title individually? 并且必须分别键入每个标题？ Like below. 像下面。

# Cleanup to merge duplicate non-index column
mm['TITLE'] = mm[['TITLE_x', 'TITLE_y']].fillna('').sum(axis=1)
mm.drop(['TITLE_x','TITLE_y'], axis=1, inplace=True)

Answer 1

Before merging, you can adjust the second dataframe such that it doesn't have any duplicated columns with the first one. 合并之前，您可以调整第二个数据框，使其与第一个数据框没有任何重复的列。

df2_to_merge = df2[[col for col in df2.columns if col not in df1.columns]]

and then you would merge df1 with df2 like you specified. 然后将df1与df2合并，如指定的那样。

如何在大熊猫中合并多个csv的列而不使用picking_x或_y，而是选择具有信息的列

问题描述

1 个解决方案

解决方案1
0 2018-01-04 21:45:52

如何在大熊猫中合并多个csv的列而不使用picking_x或_y，而是选择具有信息的列

问题描述

1 个解决方案

解决方案1 0 2018-01-04 21:45:52

解决方案1
0 2018-01-04 21:45:52