Pandas 數據框根據 groupby 隨機打亂連續的值行

Question

我還想對某個列進行分組，然后對 n 連續行進行洗牌。

df = pd.DataFrame({'grouper_col':[1,1,1,1,1,1, 2,2,2,2,2,2], 'b':[1,2,3,4,5,6,21,22,23,24,25,26]})

    grouper_col   b
0             1   1
1             1   2
2             1   3
3             1   4
4             1   5
5             1   6
6             2  21
7             2  22
8             2  23
9             2  24
10            2  25
11            2  26

然后在每個組中混洗例如兩個連續的行，例如：

    grouper_col   b
0             1   5
1             1   6
2             1   3
3             1   4
4             1   1
5             1   2
6             2  21
7             2  22
8             2  25
9             2  26
10            2  23
11            2  24

其中每組的兩個連續行與同一組中的另外兩個連續行隨機混洗。

Answer 1

這是解決這個問題的一種方法：

# find the size of each group
sizes = df.groupby('grouper_col').b.size()
# iterate over the elements of the above series
for g, v in sizes.items():
    v -= 1
    # only randomly shuffle if there are more than 4
    if v > 4:
        random_s = np.array([0,0])
        while abs(random_s[0] - random_s[1]) <= 1:
            # if the indices are next to each other not valid
            random_s = np.random.randint(0, v, 2)
        # add 1 to the above indices (i.e [0,2] to [[0,1][2,3]])
        replace_ix = random_s[:,None] + np.array([0,1])
        # keep indices to replace and replace
        to_replace = df.loc[df.grouper_col.eq(g), 'b'].values
        repl_1 = to_replace[replace_ix[0]]
        repl_2 = to_replace[replace_ix[1]]
        to_replace[replace_ix[0]] = repl_2
        to_replace[replace_ix[1]] = repl_1
        df.loc[df.grouper_col.eq(g), 'b'] = to_replace

print(df)

    grouper_col   b
0             1   5
1             1   6
2             1   3
3             1   4
4             1   1
5             1   2
6             2  21
7             2  25
8             2  26
9             2  24
10            2  22
11            2  23

Pandas 數據框根據 groupby 隨機打亂連續的值行

問題描述

1 個解決方案

解決方案1
0 2020-03-20 12:52:38

Pandas 數據框根據 groupby 隨機打亂連續的值行

問題描述

1 個解決方案

解決方案1 0 2020-03-20 12:52:38

解決方案1
0 2020-03-20 12:52:38