![](/img/trans.png)
[英]Select CONSECUTIVE rows from a DataFrame based on values in a column in Pandas with Groupby
[英]Pandas dataframe randomly shuffle consecutive rows of values based on groupby
我還想對某個列進行分組,然后對 n 連續行進行洗牌。
df = pd.DataFrame({'grouper_col':[1,1,1,1,1,1, 2,2,2,2,2,2], 'b':[1,2,3,4,5,6,21,22,23,24,25,26]})
grouper_col b
0 1 1
1 1 2
2 1 3
3 1 4
4 1 5
5 1 6
6 2 21
7 2 22
8 2 23
9 2 24
10 2 25
11 2 26
然后在每個組中混洗例如兩個連續的行,例如:
grouper_col b
0 1 5
1 1 6
2 1 3
3 1 4
4 1 1
5 1 2
6 2 21
7 2 22
8 2 25
9 2 26
10 2 23
11 2 24
其中每組的兩個連續行與同一組中的另外兩個連續行隨機混洗。
這是解決這個問題的一種方法:
# find the size of each group
sizes = df.groupby('grouper_col').b.size()
# iterate over the elements of the above series
for g, v in sizes.items():
v -= 1
# only randomly shuffle if there are more than 4
if v > 4:
random_s = np.array([0,0])
while abs(random_s[0] - random_s[1]) <= 1:
# if the indices are next to each other not valid
random_s = np.random.randint(0, v, 2)
# add 1 to the above indices (i.e [0,2] to [[0,1][2,3]])
replace_ix = random_s[:,None] + np.array([0,1])
# keep indices to replace and replace
to_replace = df.loc[df.grouper_col.eq(g), 'b'].values
repl_1 = to_replace[replace_ix[0]]
repl_2 = to_replace[replace_ix[1]]
to_replace[replace_ix[0]] = repl_2
to_replace[replace_ix[1]] = repl_1
df.loc[df.grouper_col.eq(g), 'b'] = to_replace
print(df)
grouper_col b
0 1 5
1 1 6
2 1 3
3 1 4
4 1 1
5 1 2
6 2 21
7 2 25
8 2 26
9 2 24
10 2 22
11 2 23
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.