[英]Python: combine two data frames with all combinations of index
I have a data frame with x variables and an id_number 1:n (n is large).我有一个带有 x 变量和 id_number 1:n 的数据框(n 很大)。 I want to create a new data frame that horizontally merges each pair based on id_number from the data frame.我想创建一个新的数据框,根据数据框中的 id_number 水平合并每一对。 Original data looks like this:原始数据如下所示:
id_number var_x1 var_x2
1 sth stuff
2 other things
3 more info
I want to get this for every possible pair:我想为每一对可能的对得到这个:
id_numberA var_x1A var_x2A id_numberB var_x1B var_x2B
1 sth stuff 1 sth stuff
1 sth stuff 2 other things
1 sth stuff 3 more info
2 other things 3 more info
What is the most efficient way to do this for a large dataset?对于大型数据集,最有效的方法是什么?
You can create a merging index with:您可以使用以下命令创建合并索引:
df['temp'] = 1
And then merge the dataframe
to itself with:然后将dataframe
与自身合并:
merged_df = df.merge(df, on='temp', suffixes=('A', 'B')).drop('temp', axis=1)
If you don't want the combinations of the same id_number
, do finally:如果您不想要相同id_number
的组合,请最后执行:
merged_df = merged_df[merged_df['id_numberA'] != merged_df['id_numberB']]
And if you don't want duplicated mixes of id_numberA
and id_numberB
, do finally instead:如果您不想重复混合id_numberA
和id_numberB
,请改为 finally:
merged_df = merged_df[merged_df['id_numberA'] < merged_df['id_numberB']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.