简体   繁体   中英

Shuffle rows of a DataFrame until all consecutive values in a column are different?

I have a dataframe with rows that I'd like to shuffle continuously until the value in column B is not identical across any two consecutive rows:

initial dataframe:

A  |  B
_______
a     1
b     1
c     2
d     3
e     3

Possible outcome:

A  |  B
_______
b     1
c     2
e     3
a     1
d     3

I made a function scramble meant to do this but I am having trouble passing the newly scrambled dataframe back into the function to test for matching B values:

def scamble(x):
    curr_B='nothing'
    for index, row in x.iterrows():
        next_B=row['B']
        if str(next_B) == str(curr_B):
            x=x.sample(frac=1)
            curr_B=next_B
        curr_B=next_B
    return x
df=scramble(df)

I suspect the function is finding the matching values in the next row, but I can't shuffle it continuously until there are no two sequential rows with the same B value.

Printing the output yields a dataframe shows consecutive rows with the same value in B .

If your goal is to eliminate consecutive duplicates, you can just use groupby and cumcount , then reindex your DataFrame:

df.loc[df.groupby('B').cumcount().sort_values().index]

   A  B
0  a  1
2  c  2
3  d  3
1  b  1
4  e  3

If you actually want randomness, then you can group on cumcount and call shuffle . This should eliminate consecutive dupes to some degree (NOT GUARANTEED) while preserving randomness and still avoiding slow iteration. Here's an example:

np.random.seed(0)
(df.groupby(df.groupby('B').cumcount(), group_keys=False)
   .apply(lambda x: x.sample(frac=1))
   .reset_index(drop=True))

   A  B
0  d  3
1  a  1
2  c  2
3  b  1
4  e  3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM