简体   繁体   中英

Randomly select rows from DataFrame Pandas

Okay this is somewhat tricky. I have a DataFrame of people and I want to randomly select 27% of them. I want to create a new Boolean column in that DataFrame that shows if that person was randomly selected.

Anyone have any idea how to do this?

The in-built sample function provides a frac argument to give the fraction contained in the sample.

If your DataFrame of people is people_df :

percent_sampled = 27
sample_df = people_df.sample(frac = percent_sampled/100)

people_df['is_selected'] = people_df.index.isin(sample_df.index)
n = len(df) 
idx = np.arange(n)
idx = random.shuffle(idx)
*selected_idx = idx[:int(0.27*n)] 
selected_df = df[df.index.isin(selected_idx)]

Defining a dataframe with 100 random numbers in column 0:

import random
import pandas as pd
import numpy as np
a = pd.DataFrame(range(100))
random.shuffle(a[0])

Using random.sample to choose 27 random numbers from the list, WITHOUT repetition: (replace 27 with 0.27*int(len(a[0]) if you want to define this as percentage)

choices = random.sample(list(a[0]),27)

Using np.where to assign boolean values to new column in dataframe:

a['Bool'] = np.where(a[0].isin(choices),True,False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM