I have a pandas dataframe that I want to randomly pick samples from it. The first time I want to pick 10, then 20, 30, 40, and 50 random samples (without replacment). I'm trying to do it with a for loop, altough I don't know how good this is cause a list can't contain data frames, right? (my coding is better with R and there the lists can contain dataframes).
number = [10,20,30,40,50]
sample = []
for i in range(len(number)):
sample[i].append(data.sample(n = number[i]))
And the error is IndexError: list index out of range
I dont want to copy past the code so what is the right way to do it?
You could do that using radint
method for choosing random element from the list number
:
import random
number = [10,20,30,40,50]
sample = []
for i in range(len(number)):
sample.append(data.sample(n = number[random.randint(0, len(number)-1]))
Assuming you have this dataframe for Movies Rating dataset:
data = [['avengers', 5.4 ,'PG-13'],
['captain america', 6.7, 'PG-13'],
['spiderman', 7, 'R'],
['daredevil', 8.2, 'R'],
['iron man', 8.6, 'PG-13'],
['deadpool', 10, 'R']]
df = pd.DataFrame(data, columns=['title', 'score', 'rating'])
You can take random samples from it using sample
method:
# taking random 3 records from dataframe
samples = df.sample(3)
Output:
title score rating
1 captain america 6.7 PG-13
5 deadpool 10.0 R
3 daredevil 8.2 R
Another execution:
title score rating
4 iron man 8.6 PG-13
0 avengers 5.4 PG-13
2 spiderman 7.0 R
Also you can randomize the number of samples according to your dataframe # of rows:
df.sample(random.randint(1, len(df)))
If you want you could write your own function for generating random samples from dataframe in this way:
import random
def generate_rand_sample(df):
start_i = end_i = 0
while end_i == start_i:
start_i = random.randint(0, len(df) - 1)
end_i = random.randint(start_i, len(df))
return df.iloc[start_i:end_i]
generate_rand_sample(df)
First Run:
title score rating
1 captain america 6.7 PG-13
2 spiderman 7.0 R
Second Run:
title score rating
2 spiderman 7.0 R
3 daredevil 8.2 R
4 iron man 8.6 PG-13
5 deadpool 10.0 R
Try range(len(number)-1). The reason is for loop starts from 0 to n. So in this case it will start from 0 then till 5. Which makes a total of 6 loops (0,1,2,3,4,5). That's why your list goes out of range
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.