我如何 select 為 pandas 數據幀中的每個組的 n 行隨機序列？

Question

假設我有以下數據框：

raw_data = {
    'subject_id': ['1', '1', '1', '1', '2','2','2','2','2'],
    'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Brian','Bob','Bill','Brenda','Brett']}
df = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name'])

我如何 select 為每個subject_id從df隨機行的 n 行序列？ 例如，如果我想要每個subject_id的 2 個隨機行序列，則可能的 output 將是：

subject_id   first_name
1            Amy
1            Allen
2            Brenda
2            Brett

似乎與這個問題最相似的帖子似乎是：

select 來自 pandas dataframe 的隨機行序列

但是，這似乎沒有考慮到我需要做的分組。

Answer 1

樣品后的一些工作

s = df.groupby('subject_id')['subject_id'].sample(n=2)
idx = s.sort_index().drop_duplicates().index
s = df.loc[idx.union(idx+1)]
Out[53]: 
  subject_id first_name
2          1      Allen
3          1      Alice
4          2      Brian
5          2        Bob

Answer 2

您可以嘗試以下方法：

import random
import pandas as pd

raw_data = {
    'subject_id': ['1', '1', '1', '1', '2','2','2','2','2'],
    'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Brian','Bob','Bill','Brenda','Brett']}
df = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name'])


def f(g):
    k = random.randrange(len(g)-1)
    return g.iloc[k:k+2]
    
sample = df.groupby('subject_id').apply(f).reset_index(level=0, drop=True)
print(sample)

它給：

  subject_id first_name
0          1       Alex
1          1        Amy
5          2        Bob
6          2       Bill

我如何 select 為 pandas 數據幀中的每個組的 n 行隨機序列？

問題描述

2 個解決方案

解決方案1
2 已采納 2022-09-26 01:04:38

解決方案2
0 2022-09-26 16:24:45

我如何 select 為 pandas 數據幀中的每個組的 n 行隨機序列？

問題描述

2 個解決方案

解決方案1 2 已采納 2022-09-26 01:04:38

解決方案2 0 2022-09-26 16:24:45

解決方案1
2 已采納 2022-09-26 01:04:38

解決方案2
0 2022-09-26 16:24:45