[英]Shuffling rows in pandas but orderly
Let's say that I have a data frame of three columns: age, gender, and country.假设我有一个包含三列的数据框:年龄、性别和国家。
I want to randomly shuffle this data but in an ordered fashion according to gender.我想随机打乱这些数据,但要根据性别按顺序排列。 There are n males and m females, where n could be less than, greater than, or equal to m.有 n 个男性和 m 个女性,其中 n 可以小于、大于或等于 m。 The shuffling should happen in such a way that we get the following results for a size of 8 people:改组应该以这样一种方式发生,即我们得到 8 人大小的以下结果:
male, female, male, female, male, female, female, female,.... (if there are more females: m > n) male, female, male, female, male, male, male, male (if there are more males: n > m) male, female, male, female, male, female, male, female, male, female (if equal males and females: n = m)男,女,男,女,男,女,女,女,....(如果有更多的女:m>n)男,女,男,女,男,男,男,男(如果有更多男性:n > m)男性,女性,男性,女性,男性,女性,男性,女性,男性,女性(如果男性和女性相等:n = m)
df = pd.DataFrame({'Age': [10, 20, 30, 40, 50, 60, 70, 80],
'Gender': ["Male", "Male", "Male", "Female", "Female", "Male", "Female", "Female"],
'Country': ["US", "UK", "China", "Canada", "US", "UK", "China", "Brazil"]})
First add the sequence numbers within each group:首先添加每个组内的序列号:
df['Order'] = df.groupby('Gender').cumcount()
Then sort:然后排序:
df.sort_values('Order')
It gives you:它为您提供:
Age Gender Country Order
0 10 Male US 0
3 40 Female Canada 0
1 20 Male UK 1
4 50 Female US 1
2 30 Male China 2
6 70 Female China 2
5 60 Male UK 3
7 80 Female Brazil 3
If you want to shuffle, do that at the very beginning, eg df = df.sample(frac=1)
, see: Shuffle DataFrame rows如果你想洗牌,在一开始就这样做,例如df = df.sample(frac=1)
,请参阅: Shuffle DataFrame rows
Create two new dataframes with a 'Sort_Column'
and make the df_male
dataframe even values and the df_female
dataframe odd values.使用'Sort_Column'
创建两个新数据帧,并使df_male
dataframe 偶数值和df_female
dataframe 奇数值。 Then, use pd.concat
to bring them back together and use .sort_values()
on the 'Sort_Column'
.然后,使用pd.concat
将它们重新组合在一起并在'Sort_Column'
上使用.sort_values()
。
df = pd.DataFrame({'Age': [10, 20, 30, 40, 50, 60, 70, 80],
'Gender': ["Male", "Male", "Male", "Female", "Female", "Male", "Female", "Female"],
'Country': ["US", "UK", "China", "Canada", "US", "UK", "China", "Brazil"]})
df['Sort_Column'] = 0
df_male = df.loc[df['Gender'] == 'Male'].reset_index(drop=True)
df_male['Sort_Column'] = df_male['Sort_Column'] + df_male.index*2
df_female = df1.loc[df1['Gender'] == 'Female'].reset_index(drop=True)
df_female['Sort_Column'] = df_female['Sort_Column'] + df_female.index*2 + 1
df_sorted=pd.concat([df_male, df_female]).sort_values('Sort_Column').drop('Sort_Column', axis=1).reset_index(drop=True)
df_sorted
Ouput:输出:
Age Gender Country
0 10 Male US
1 40 Female Canada
2 20 Male UK
3 50 Female US
4 30 Male China
5 70 Female China
6 60 Male UK
7 80 Female Brazil
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.