简体   繁体   English

交叉连接/合并以创建组合的数据框(顺序无关紧要)

[英]cross join/merge to create dataframe of combinations (order doesn't matter)

I have a dataframe that has 6 categorical/string values. 我有一个具有6个类别/字符串值的数据框。 I want to create a dataframe of all possible combination of these string values where order DOES NOT matter (ie a, b = b, a). 我想创建一个所有这些字符串值的所有可能组合的数据帧,顺序无关紧要(即,a,b = b,a)。

I did the following but I see that the result is a permutation and not a combination ie it distinguishes (IL, IL-1) from (IL-1, IL). 我做了以下工作,但我看到结果是排列而不是组合,即它区分了(IL-1)和(IL-1)。

I have read through: 我已阅读:

http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational-algebra http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational-algebra

In mysql I can do this via: 在mysql中,我可以通过以下方式进行操作:

select r1.id, r2,id 
from rows r1 
cross join rows r2 
where r1.id < r2.id

I appreciate your help. 我感谢您的帮助。

>data = ['IL', 'IL-1', 'IL-2', 'IL-3', 'IL-4', 'IL-5']
>df = pd.DataFrame(data)
>df['key1']= pd.Series([1] * len(df))
>df2 = df.copy()
>cart = pd.merge(df, df2, on='key1')

Resulting dataframe: 结果数据框:

0_x

key1

0_y

0
IL 1 IL 
1
IL 1 IL-1 
2
IL 1 IL-2 
3
IL 1 IL-3 
4
IL 1 IL-4 
5
IL 1 IL-5 
6
IL-1 1 IL 
7
IL-1 1 IL-1  
8
IL-1 1 IL-2    
9
IL-1 1 IL-3 
10
IL-1 1 IL-4   
11
IL-1 1 IL-5 
12
IL-2 1 IL 
13
IL-2 1 IL-1 
14
IL-2 1 IL-2 
15
IL-2 1 IL-3 
16
IL-2 1 IL-4 
17
IL-2 1 IL-5 
18
IL-3 1 IL 
19
IL-3 1 IL-1 
20
IL-3 1 IL-2 
21
IL-3 1 IL-3 
22
IL-3 1 IL-4 
23
IL-3 1 IL-5  
24
IL-4 1 IL 
25
IL-4 1 IL-1 
26
IL-4 1 IL-2 
27
IL-4 1 IL-3 
28
IL-4 1 IL-4 
29
IL-4 1 IL-5  
30
IL-5 1 IL 
31
IL-5 1 IL-1 
32
IL-5 1 IL-2 
33
IL-5 1 IL-3 
34
IL-5 1 IL-4 
35
IL-5 1 IL-5 

Putting together what's on the comments and making a 15 row (6C2) DataFrame with the proposed index and some dummy data: 将评论中的内容放在一起,并用建议的索引和一些虚拟数据制作一个15行(6C2)的DataFrame

import itertools
import pandas as pd

labels = ['IL', 'IL-1', 'IL-2', 'IL-3', 'IL-4', 'IL-5']
i = pd.MultiIndex.from_tuples(list(itertools.combinations(labels, 2)))
df = pd.DataFrame({'col1':range(len(i))}, index=i)

Output: 输出:

           col1
IL   IL-1     0
     IL-2     1
     IL-3     2
     IL-4     3
     IL-5     4
IL-1 IL-2     5
     IL-3     6
     IL-4     7
     IL-5     8
IL-2 IL-3     9
     IL-4    10
     IL-5    11
IL-3 IL-4    12
     IL-5    13
IL-4 IL-5    14

In case you want all 36 combinations of a cartesian product (which I don't think is the case): 如果您想要笛卡尔积的全部36种组合(我认为情况并非如此):

i = pd.MultiIndex.from_product([labels, labels])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 交叉连接/合并dataframe1以基于dataframe1中的列创建组合的dataframe2 - cross join/merge dataframe1 to create dataframe2 of combinations based on column in dataframe1 python merge():如果我更改数据框顺序不起作用 - python merge() : doesn't work if i change the dataframe order 基于 Boolean 条件子集 Pandas dataframe - 为什么顺序无关紧要? - Subsetting Pandas dataframe based on Boolean condition - why doesn't order matter? 数据框比较 EQ - 位置无关紧要 - Dataframe compare EQ - position doesn't matter 使用 DataFrame 交叉连接不会抛出公共列来执行合并 - Using DataFrame cross join throw no common columns to perform merge on 正则表达式存在一些其顺序无关紧要的单词 - Regex for existence of some words whose order doesn't matter Multiples-keys字典,其中键顺序无关紧要 - Multiples-keys dictionary where key order doesn't matter 匹配顺序无关紧要的集合中的确切元素 - Matching exact elements in a set where order doesn't matter 2个坐标的快速哈希,顺序无关紧要? - Fast hash for 2 coordinates where order doesn't matter? (pandas)如何根据三个相似的数据列创建唯一标识符,其中顺序无关紧要? - (pandas)How can I create a unique identifier based on three similar columns of data, where order doesn't matter?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM