简体   繁体   English

如何在不止一列上做熊猫样本?

[英]How to do a pandas sample on more than one column?

I have a dataframe with about 8 million observations.我有一个包含大约 800 万个观测值的数据框。 I need to pull a sample from that, but would like to sample from more than one column.我需要从中抽取样本,但想从多列中抽取样本。

I've tried the following which does not work:我尝试了以下不起作用的方法:

import pandas as pd

state = ['mi', 'mi', 'mi', 'nc', 'pa', 'pa', 'ga']
state = state * 50
age = ['21', '22', '23', '23', '23', '50', '50']
age = age * 50
random = ['.445', '.324', '.234', '.143', '.568', '.777', '.256']
random = random * 50
data = {'state':state, 'age': age, 'random': random}
df = pd.DataFrame.from_dict(data = data)

df_sample = df.sample(n = 25, weights = ['state', 'age'], random_state = 48)

I realize the pandas docs does not state what I want to do is possible.我意识到pandas文档没有说明我想要做什么是可能的。 Is there a way I can do this?有没有办法做到这一点?

IIUC,国际大学联盟,

I think you are looking to achieve the following:我认为您正在寻求实现以下目标:

df_sample = df[['state','age']].sample(n = 25, random_state = 48)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM