简体   繁体   English

Python中数据帧的条件采样

[英]Conditional Sampling of Data Frame in Python

I have a Dataframe of Names, Sex, Ages of individuals:我有一个姓名、性别、年龄的数据框:

I would like to create a new Dataframe by sampling a fixed number of samples such that the average age of the new DataFrame is the same as the original DataFrame.我想通过对固定数量的样本进行采样来创建一个新的 Dataframe,以便新 DataFrame 的平均年龄与原始 DataFrame 相同。

sample_df = pd.DataFrame({'Var':['A','B','C','D','E'] , 'Ages' : [22,35,43,18,NaN]})

sample_df
Out[410]: 
  Var  Ages
0   A    22
1   B    35
2   C    43
3   D    18
4   E    NaN

I would like to sample only 3 rows such that the age of 'E' is equal to the mean of A,B,C,D我只想采样 3 行,以便“E”的年龄等于 A、B、C、D 的平均值

Consider an indefinite iteration using while True then break after needs are met but depending on the variability of your data, this may take some time to process.考虑使用while True进行无限迭代while True然后在满足需求后break ,但根据数据的可变性,这可能需要一些时间来处理。 Below builds a list of 100-row samples and breaks after ten samples are achieved.下面构建了一个包含 100 行样本的列表,并在获得 10 个样本后中断。

samples = []

while True:
   sample_df = df.sample(n = 100)

   if sample_df['Age'].mean() == df['Age'].mean():
      samples.append(sample_df)

   if len(samples) == 10:
      break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM