简体   繁体   中英

How to shuffle blocks of data frame (different sizes) in a list in Python?

Below is some dummy code of what I would like to achieve and my question is at the end.I would like to shuffle blocks of data frame (different sizes) in a list in Python. Thanks.

Set up a dummy dictionary:

dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
         "Alphabet":["A","B","C","D","E","F","G","H","I","J"],
         "Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}

Turn dictionary into data frame:

dummy_df = pd.DataFrame(dummy)

Create blocks of data frame with required size:

blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
    a = blocksize[j]
    blocks.append(dummy_df[i:i+a])
    i += a
blocks

Below is the output of "blocks". It is 4 blocks of data frame with size of 1-4 rows in a list:

[   ID Alphabet  Fruit
 0   1        A  apple,    
ID Alphabet    Fruit
 1   2        B   banana
 2   3        C  coconut,    
ID Alphabet           Fruit
 3   4        D            date
 4   5        E  elephant apple
 5   6        F          feijoa,    
ID Alphabet       Fruit
 6   7        G       guava
 7   8        H    honeydew
 8   9        I    ita palm
 9  10        J  jack fruit]

I am stuck after the above.

I have tried many different things but kept getting errors. I would like to shuffle those blocks of data frame in the list, then combined them back into a dataframe. Below is an example of the shuffled output. How could I do this please?

Example ideal output:

    ID  Alphabet    Fruit
1   2   B   banana
2   3   C   coconut
0   1   A   apple
6   7   G   guava
7   8   H   honeydew
8   9   I   ita palm
9   10  J   jack fruit
3   4   D   date
4   5   E   elephant apple
5   6   F   feijoa

After you have the list, you can shuffle the blocks using random.shuffle . After that, you can create a new empty dataframe then append each block from the (shuffled) list.

Try this code:

import pandas as pd
import random

dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
         "Alphabet":["A","B","C","D","E","F","G","H","I","J"],
         "Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}

dummy_df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
    a = blocksize[j]
    blocks.append(dummy_df[i:i+a])
    i += a

random.shuffle(blocks)  # shuffle blocks in list

dfs = pd.DataFrame()  # new empty dataframe

for b in blocks: # each block 
   dfs = dfs.append(b) # add to dataframe
   
print(dfs)

Output

   ID Alphabet           Fruit
3   4        D            date
4   5        E  elephant apple
5   6        F          feijoa
1   2        B          banana
2   3        C         coconut
6   7        G           guava
7   8        H        honeydew
8   9        I        ita palm
9  10        J      jack fruit
0   1        A           apple

You can use .sample(frac=1) to shuffle data directly in dataframe

blocks.append( df[start:end].sample(frac=1) )

And later you can use df.append(list_of_df) to join all dataframes at once.

df = blocks[0].append(blocks[1:])

import pandas as pd

dummy = {
    "ID": [1,2,3,4,5,6,7,8,9,10],
    "Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
    "Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}

df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []

start = 0
for size in blocksize:
    end = start + size
    blocks.append(df[start:end].sample(frac=1))
    start = end

#for item in blocks:
#    print(item)

df = blocks[0].append(blocks[1:]) # .reset_index(drop=True)
print(df)

Other methods to shuffle: Shuffle DataFrame rows

Doc: pandas.DataFrame.sample


Other idea is to get only shuffled indexes using .sample(frac=1)

blocks += df[start:end].sample(frac=1).index.tolist()

or random.shuffle()

indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes

and later use these indexes to create new DataFrame

df = df.iloc[blocks]

import pandas as pd
import random

dummy = {
    "ID": [1,2,3,4,5,6,7,8,9,10],
    "Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
    "Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}

df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []

start = 0
for size in blocksize:
    end = start + size

    #blocks += df[start:end].sample(frac=1).index.tolist()
   
    indexes = df[start:end].index.tolist()
    random.shuffle(indexes)
    blocks += indexes
    
    start = end

#for item in blocks:
#    print(item)

df = df.iloc[blocks]

print(df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM