[英]How do I merge more than one column for csv's in pandas without picking_x or _y but instead picking the one that has the information
I am trying to merge two csv's without having to pick the value from _x or _y. 我正在尝试合并两个csv,而不必从_x或_y中选择值。
MetaData1
Sample_name TITLE
Cody Chicken Pox
Claudia Chicken Pox
Alex Chicken Pox
Steven Chicken Pox
Mom Chicken Pox
Dad
MetaData2
Sample_name TITLE Geo_Loc DESCRIPTION
Dad Chicken Pox Earth people
Me Chicken Pox Earth people
Roger Chicken Pox Earth people
Ben Chicken Pox Earth people
Merge together to look like this: 合并在一起看起来像这样:
Merged Metadata
Sample_name TITLE Geo_Loc DESCRIPTION
Cody Chicken Pox Missing:Not Applicable Missing:Not Applicable
Claudia Chicken Pox Missing:Not Applicable Missing:Not Applicable
Alex Chicken Pox Missing:Not Applicable Missing:Not Applicable
Steven Chicken Pox Missing:Not Applicable Missing:Not Applicable
Mom Chicken Pox Missing:Not Applicable Missing:Not Applicable
Dad Chicken Pox Earth people
Me Chicken Pox Earth people
Roger Chicken Pox Earth people
Ben Chicken Pox Earth people
The code I have so far is Below, 到目前为止,我的代码如下
#Merging two or more csv files using pandas
#Duplicate line for more than one csv file
File_one = panda.read_csv('/Users/c1carpenter/Desktop/Test.txt', sep='\t', header=0, dtype=str)
File_two = panda.read_csv('/Users/c1carpenter/Desktop/Test2.txt', sep='\t', header=0, dtype=str)
Merge_File = panda.merge(File_one, File_two, how='outer', on='Sample_name')
however if I have a hundred columns, of which 50 end up being duplicates.How do I merge them without losing the data. 但是,如果我有100列,其中50列最终是重复的。如何合并它们而不丢失数据。 and having to type out each title individually?
并且必须分别键入每个标题? Like below.
像下面。
# Cleanup to merge duplicate non-index column
mm['TITLE'] = mm[['TITLE_x', 'TITLE_y']].fillna('').sum(axis=1)
mm.drop(['TITLE_x','TITLE_y'], axis=1, inplace=True)
Before merging, you can adjust the second dataframe such that it doesn't have any duplicated columns with the first one. 合并之前,您可以调整第二个数据框,使其与第一个数据框没有任何重复的列。
df2_to_merge = df2[[col for col in df2.columns if col not in df1.columns]]
and then you would merge df1 with df2 like you specified. 然后将df1与df2合并,如指定的那样。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.