根据一列值的组合有效拆分 pandas dataframe

Question

Lets say I have a dataframe with one column and it has 3 unique values假设我有一个 dataframe 有一列，它有 3 个唯一值

import pandas as pd
df = pd.DataFrame(['a', 'b', 'c'], columns = ['string'])
df

I want to split this dataframe into smaller data frames, such that each dataframe will contain 2 unique values.我想将此 dataframe 拆分为更小的数据帧，这样每个 dataframe 将包含 2 个唯一值。 In the above case I need 3 data frames 3c2(nCr) = 3. df1 - [ab] df2 - [ac] df3 - [bc].在上述情况下，我需要 3 个数据帧 3c2(nCr) = 3。df1 - [ab] df2 - [ac] df3 - [bc]。 Please click on the below link to see my current implementation.请点击下面的链接查看我当前的实现。

Click here to see current code and output单击此处查看当前代码和 output

import itertools
for i in itertools.combinations(df.string.values, 2):
    print(df[df.string.isin(i)], '\n')

I am looking something like groupby in pandas.我在 pandas 中寻找类似 groupby 的东西。 Because sub-setting data inside loop is time consuming.因为循环内的子设置数据非常耗时。 In one of the sample case, I have 609 unique values and it was taking around 3 mins to complete the loop.在一个示例案例中，我有 609 个唯一值，完成循环大约需要 3 分钟。 So, looking for some optimized way to perform the same operation, as the unique values may shoot up to 1000's in real scenarios.因此，寻找一些优化的方法来执行相同的操作，因为在真实场景中唯一值可能会高达 1000 个。

Answer 1

It will be slow because you're creating 370k dataframes.它会很慢，因为您正在创建 370k 数据帧。 If all of them are supposed to only hold two values, why does it need to be a dataframe?如果它们都应该只包含两个值，为什么它需要是 dataframe？

df = pd.DataFrame({'x': range(100)})
df['key'] = 1
records = df.merge(df, on='key').drop('key', axis=1).to_dict('r')
[pd.Series(x) for x in records]

You will see that records is computed quite fast but then it takes a few minutes to generate all of these series objects.您会看到records的计算速度非常快，但是生成所有这些系列对象需要几分钟。

根据一列值的组合有效拆分 pandas dataframe

问题描述

1 个解决方案

解决方案1
0 2021-05-25 11:02:30

根据一列值的组合有效拆分 pandas dataframe

问题描述

1 个解决方案

解决方案1 0 2021-05-25 11:02:30

解决方案1
0 2021-05-25 11:02:30