简体   繁体   English

如何根据多个条件随机select行

[英]How to randomly select rows based on multiple conditions

I have two datasets with 20 rows each.我有两个数据集,每个数据集有 20 行。 I am looking to randomly select 10 rows from each dataset following the criteria below.我希望按照以下标准从每个数据集中随机 select 10 行。

df1 group: df1组:

  • 8 mammals 8 哺乳动物
  • 2 reptiles 2 爬行动物

df2 group: df2组:

  • 4 mammals 4 哺乳动物
  • 2 birds 2 只鸟
  • 3 reptiles 3 爬行动物
  • 1 fish 1条鱼

6 terrestrial and 4 aquatic ecosystems for both 6 个陆地生态系统和 4 个水生生态系统

df1.query("Class = Mammal").sample(n=8)

df1.query("Class = Reptile").sample(n=2)

I've seen solutions like this that should work, but I can't figure out how to include the ecosystems requirement.我见过这样的解决方案应该可行,但我不知道如何包含生态系统要求。 AKA I want 8 mammals and 2 reptiles selected from group 1, ensuring that 6 of them come from terrestrial ecosystems and 4 from aquatic. AKA 我想要从第 1 组中选出 8 只哺乳动物和 2 只爬行动物,确保其中 6 只来自陆地生态系统,4 只来自水生生态系统。 I think there should be a way to do this with a groupby function of the two columns, but I haven't yet figured that out.我认为应该有办法用两列的 groupby function 来做到这一点,但我还没有想出来。

Sample data:样本数据:

Common name常用名 Class Class Ecosystem生态系统
Lion狮子 Mammal哺乳动物 Terrestrial地面
Humpback whale座头鲸 Mammal哺乳动物 Aquatic
Crocodile鳄鱼 Reptile爬虫 Aquatic

I don't know how to do it in a clean way with just the built-in pandas functions like groupby .我不知道如何使用groupby等内置 pandas 函数以干净的方式进行操作。 That said, here's a solution using random and lists.也就是说,这是一个使用random和列表的解决方案。

import random

animal_class = ["Mammal"] * 8 + ["Reptile"] * 2

ecosystem = ["Terrestrial"] * 6 + ["Aquatic"] * 4
random.shuffle(ecosystem)  # randomly shuffle ecosystem

df1_selected = pd.DataFrame(columns=df1.columns)
for i in range(10):
    df1_selected = df1_selected.append(
        df1.query(f"Class = {animal_class[i]} and Ecosystem = {ecosystem[i]}").sample(n=1)
    )

Just change the animal_class to do the same thing for df2.只需更改animal_class即可为 df2 执行相同的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM