简体   繁体   English

根据多个条件从 Pandas DataFrame 中随机选择行

[英]Randomly select rows from Pandas DataFrame based on multiple criteria

I am trying to use Python to sample data to QA.我正在尝试使用 Python 将数据采样到 QA。 My criteria is to audit 2 individuals and then a random sample of their respective vendors based on a risk level.我的标准是审核 2 个人,然后根据风险级别随机抽取他们各自供应商的样本。 So I need a script that basically says:所以我需要一个基本上说的脚本:

If or While the PM Owner is Alex, then randomly select 1 (as long as 1 exists) each of Critical Risk, High Risk, Medium Risk and Low Risk.如果或当 PM 所有者是 Alex,则随机选择 1 个(只要 1 个存在)严重风险、高风险、中等风险和低风险。

WHILE df['PM Owner'] == 'Alex':
    IF df['Risk Tier'] == 'Critical':
        df['Risk Tier'].sample()

I get this error:我收到此错误:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()`

Then I need to repeat the loop for the other individual.然后我需要为另一个人重复这个循环。

I have tried if and while loops without the success I need.我尝试过ifwhile循环,但没有成功。

My columns for this are 'PM Owner' and 'Risk Tier' .我的专栏是'PM Owner''Risk Tier'

I am not sure did I get the question right or not, but at least this answer will help other to give you a answer If this is not what you are looking for, please give me shot我不确定我的问题是否正确,但至少这个答案会帮助其他人给你一个答案如果这不是你想要的,请给我一个机会

import pandas as pd
#your dataframe  
maindf = {'PM Owner': ['A', 'B','C','A','E','F'], 'Risk Tier': [1,3,1,1,1,2],'sam' :['A0','B0','C0','D0','E0','F0']}
Maindf = pd.DataFrame(data=maindf)
 

#what you are looking for
filterdf = {'PM Owner': ['A'  ], 'Risk Tier': [ 1 ]}
Filterdf = pd.DataFrame(data=filterdf)

 
#Filtering
NewMaindf= (Maindf[Maindf[['PM Owner','Risk Tier']].astype(str).sum(axis = 1).isin(
                Filterdf[['PM Owner','Risk Tier']].astype(str).sum(axis = 1))])
#Just one sample
print( (NewMaindf).sample())
#whole dataset after filtering
print( (NewMaindf) )

Result :结果 :

 PM Owner  Risk Tier sam
3        A          1  D0
  PM Owner  Risk Tier sam
0        A          1  A0
3        A          1  D0

The conditions you specified can match many rows, that is why you got the error also suggesting you to use one of the functions that reduces the results to a single value.您指定的条件可以匹配多行,这就是为什么您收到错误并建议您使用将结果减少为单个值的函数之一。 However the conditions in their present form can actually be used as masks, so it may be possible to draw the samples matching the criteria simply by narrowing down the scope, ie.:然而,目前形式的条件实际上可以用作掩码,因此可以简单地通过缩小范围来绘制符合标准的样本,即:

df.loc[(df['PM Owner'] == 'Alex') & (df['Risk Tier'] == 'Critical'), 'Risk Tier'].sample()

If you need to loop through every PM Owner you can do so:如果您需要遍历每个 PM Owner,您可以这样做:

for pm_owner in df['PM Owner']:
    sample = df.loc[(df['PM Owner'] == pm_owner) & (df['Risk Tier'] == 'Critical'), 'Risk Tier'].sample()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据特定条件从Pandas数据帧中随机选择行? - How to randomly select rows from Pandas dataframe based on a specific condition? 熊猫:使用多个条件从数据框中选择行的有效方法 - Pandas: Efficient way to select rows from a dataframe using multiple criteria 按条件过滤行和 select 多列来自 dataframe 和 python Z3A43B4F88325D94022C0EFA9 - Filter rows by criteria and select multiple columns from a dataframe with python pandas 随机 select 行从 DataFrame Pandas - Randomly select rows from DataFrame Pandas 根据熊猫中MULTIPLE列中的值从DataFrame中选择行 - Select rows from a DataFrame based on values in a MULTIPLE columns in pandas 有没有更好的方法来基于多个条件从 pandas DataFrame 行 select 行? - Is there a better way to select rows from a pandas DataFrame based on multiple conditions? Python Pandas-从2到2行之间的数据框中随机选择行 - Python Pandas - Randomly select rows from a dataframe between 2 two rows 删除基于索引的行熊猫数据框(多个条件)(Python 3.5.1) - Delete rows pandas Dataframe based on index (multiple criteria) (Python 3.5.1) 为什么不能根据多个或条件在 python pandas 数据框中选择数据 - Why not able to select data in python pandas dataframe based on multiple or criteria Pandas select 行基于从特定列中随机选择的组 - Pandas select rows based on randomly selected group from a specific column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM