[英]Randomly select rows from Pandas DataFrame based on multiple criteria
I am trying to use Python to sample data to QA.我正在尝试使用 Python 将数据采样到 QA。 My criteria is to audit 2 individuals and then a random sample of their respective vendors based on a risk level.
我的标准是审核 2 个人,然后根据风险级别随机抽取他们各自供应商的样本。 So I need a script that basically says:
所以我需要一个基本上说的脚本:
If or While the PM Owner is Alex, then randomly select 1 (as long as 1 exists) each of Critical Risk, High Risk, Medium Risk and Low Risk.如果或当 PM 所有者是 Alex,则随机选择 1 个(只要 1 个存在)严重风险、高风险、中等风险和低风险。
WHILE df['PM Owner'] == 'Alex':
IF df['Risk Tier'] == 'Critical':
df['Risk Tier'].sample()
I get this error:我收到此错误:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()`
Then I need to repeat the loop for the other individual.然后我需要为另一个人重复这个循环。
I have tried if
and while
loops without the success I need.我尝试过
if
和while
循环,但没有成功。
My columns for this are 'PM Owner'
and 'Risk Tier'
.我的专栏是
'PM Owner'
和'Risk Tier'
。
I am not sure did I get the question right or not, but at least this answer will help other to give you a answer If this is not what you are looking for, please give me shot我不确定我的问题是否正确,但至少这个答案会帮助其他人给你一个答案如果这不是你想要的,请给我一个机会
import pandas as pd
#your dataframe
maindf = {'PM Owner': ['A', 'B','C','A','E','F'], 'Risk Tier': [1,3,1,1,1,2],'sam' :['A0','B0','C0','D0','E0','F0']}
Maindf = pd.DataFrame(data=maindf)
#what you are looking for
filterdf = {'PM Owner': ['A' ], 'Risk Tier': [ 1 ]}
Filterdf = pd.DataFrame(data=filterdf)
#Filtering
NewMaindf= (Maindf[Maindf[['PM Owner','Risk Tier']].astype(str).sum(axis = 1).isin(
Filterdf[['PM Owner','Risk Tier']].astype(str).sum(axis = 1))])
#Just one sample
print( (NewMaindf).sample())
#whole dataset after filtering
print( (NewMaindf) )
Result :结果 :
PM Owner Risk Tier sam
3 A 1 D0
PM Owner Risk Tier sam
0 A 1 A0
3 A 1 D0
The conditions you specified can match many rows, that is why you got the error also suggesting you to use one of the functions that reduces the results to a single value.您指定的条件可以匹配多行,这就是为什么您收到错误并建议您使用将结果减少为单个值的函数之一。 However the conditions in their present form can actually be used as masks, so it may be possible to draw the samples matching the criteria simply by narrowing down the scope, ie.:
然而,目前形式的条件实际上可以用作掩码,因此可以简单地通过缩小范围来绘制符合标准的样本,即:
df.loc[(df['PM Owner'] == 'Alex') & (df['Risk Tier'] == 'Critical'), 'Risk Tier'].sample()
If you need to loop through every PM Owner you can do so:如果您需要遍历每个 PM Owner,您可以这样做:
for pm_owner in df['PM Owner']:
sample = df.loc[(df['PM Owner'] == pm_owner) & (df['Risk Tier'] == 'Critical'), 'Risk Tier'].sample()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.