随机 select 行从 DataFrame Pandas

Question

Okay this is somewhat tricky.好的，这有点棘手。 I have a DataFrame of people and I want to randomly select 27% of them.我有一个 DataFrame 的人，我想随机选择 select 其中 27%。 I want to create a new Boolean column in that DataFrame that shows if that person was randomly selected.我想在 DataFrame 中创建一个新的 Boolean 列，以显示该人是否是随机选择的。

Anyone have any idea how to do this?任何人都知道如何做到这一点？

Answer 1

The in-built sample function provides a frac argument to give the fraction contained in the sample.内置sample function 提供了frac参数来给出示例中包含的分数。

If your DataFrame of people is people_df :如果您的DataFrame人是people_df ：

percent_sampled = 27
sample_df = people_df.sample(frac = percent_sampled/100)

people_df['is_selected'] = people_df.index.isin(sample_df.index)

Answer 2

n = len(df) 
idx = np.arange(n)
idx = random.shuffle(idx)
*selected_idx = idx[:int(0.27*n)] 
selected_df = df[df.index.isin(selected_idx)]

Answer 3

Defining a dataframe with 100 random numbers in column 0:在第 0 列定义一个包含 100 个随机数的 dataframe：

import random
import pandas as pd
import numpy as np
a = pd.DataFrame(range(100))
random.shuffle(a[0])

Using random.sample to choose 27 random numbers from the list, WITHOUT repetition: (replace 27 with 0.27*int(len(a[0]) if you want to define this as percentage)使用 random.sample 从列表中选择 27 个随机数，不重复：（如果要将其定义为百分比，请将 27 替换为 0.27*int(len(a[0])）

choices = random.sample(list(a[0]),27)

Using np.where to assign boolean values to new column in dataframe:使用 np.where 将 boolean 值分配给 dataframe 中的新列：

a['Bool'] = np.where(a[0].isin(choices),True,False)

随机 select 行从 DataFrame Pandas

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-07-20 20:48:08

解决方案2
0

解决方案3
0 2020-07-20 21:06:54

随机 select 行从 DataFrame Pandas

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-07-20 20:48:08

解决方案2 0

解决方案3 0 2020-07-20 21:06:54

解决方案1
1 已采纳 2020-07-20 20:48:08

解决方案2
0

解决方案3
0 2020-07-20 21:06:54