简体   繁体   English

在 Pandas 数据框中随机排列行,将重复项放在一起

[英]Shuffle rows in pandas dataframe, keeping duplicates together

I have a data like this:我有这样的数据:

A  B  C  D  E  F
35 1  2  35 25 65
40 5  7  47 57 67
20 1  8  74 58 63
35 1  2  37 28 69
40 5  7  49 58 69
20 1  8  74 58 63
35 1  2  47 29 79
40 5  7  55 77 87
20 1  8  74 58 63

Here we can see that Columns A,B and C have replicas that are repeated in various rows.在这里我们可以看到列 A、B 和 C 具有在不同行中重复的副本。 I want to shuffle all the rows and have the replicas in consecutive rows, without deleting any of them.我想对所有行进行洗牌并将副本放在连续行中,而不删除其中任何一个。 The output should look like this:输出应如下所示:

A  B  C  D  E  F
35 1  2  35 25 65
35 1  2  37 28 69
35 1  2  47 29 79
40 5  7  47 57 67
40 5  7  49 58 69
40 5  7  55 77 87
20 1  8  74 58 63
20 1  8  74 58 63
20 1  8  74 58 63

When I use pandas.DataFrame.duplicated , it can give me duplicated rows.当我使用pandas.DataFrame.duplicated ,它可以给我重复的行。 How can I keep all the identical rows using groupby ?如何使用groupby保留所有相同的行?

Here is code that achieves the result you asked for (which doesn't require either explicit shuffling or sorting, but merely grouping your existing df by columns A,B,C):这是实现您要求的结果的代码(不需要显式改组或排序,而只需按 A、B、C 列对现有 df 进行分组):

df_shuf = pd.concat( group[1] for group in df.groupby(['A','B','C'], sort=False) )

print(df_shuf.to_string(index=False))

A  B  C   D   E   F
35  1  2  35  25  65
35  1  2  37  28  69
35  1  2  47  29  79
40  5  7  47  57  67
40  5  7  49  58  69
40  5  7  55  77  87
20  1  8  74  58  63
20  1  8  74  58  63
20  1  8  74  58  63

Notes:笔记:

  • I couldn't figure out how to do df.reindex in-place on the grouped object.我不知道如何在分组对象上就地执行df.reindex But we can get by without it.但我们可以没有它。
  • You don't need pandas.DataFrame.duplicated , since df.groupby(['A','B','C'] puts all duplicates in the same group already.您不需要pandas.DataFrame.duplicated ,因为df.groupby(['A','B','C']将所有重复项放在同一组中。
  • df.groupby(... sort=False) is faster, use it whenever you don't need the groups sorted by default. df.groupby(... sort=False)更快,当您不需要默认排序的组时使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM