简体   繁体   English

需要在行级别对 pandas dataframe 进行评估

[英]Need to do eval on pandas dataframe at row level

I have a scenario where my pandas data frame have a condition stored as string which I need to execute and store result as different column.我有一个场景,我的 pandas 数据帧有一个条件存储为字符串,我需要执行该条件并将结果存储为不同的列。 Below example will help you understand better;以下示例将帮助您更好地理解;

Existing DataFrame:现有DataFrame:

ID   Val    Cond
1     5      >10
1     15     >10

Expected DataFrame:预期 DataFrame:

ID   Val    Cond    Result
1     5      >10     False
1     15     >10     True

As you see and I need to concatenate Val and Cond and do eval at row level.如您所见,我需要连接 Val 和 Cond 并在行级别进行 eval。

If your conditions are formed from the basic operations (<, <=, ==, ,=, >, >=), then we can do this more efficiently using getattr .如果您的条件是由基本操作(<、<=、==、、=、>、>=)形成的,那么我们可以使用getattr更有效地执行此操作。 We use .str.extract to parse the condition and separate the comparison and the value.我们使用.str.extract来解析条件并将比较和值分开。 Using our dictionary we map the comparison to the Series attributes that we can then call for each unique comparison separately in a simple groupby.使用我们的字典,我们将 map 与 Series 属性的比较,然后我们可以在简单的 groupby 中分别调用每个唯一比较。

import pandas as pd

print(df)
   ID  Val  Cond
0   1    5   >10
1   1   15   >10
2   1   20  ==20
3   1   25  <=25
4   1   26  <=25

# All operations we might have. 
d = {'>': 'gt', '<': 'lt', '>=': 'ge', '<=': 'le', '==': 'eq', '!=': 'ne'}

# Create a DataFrame with the LHS value, comparator, RHS value
tmp = pd.concat([df['Val'], 
                 df['Cond'].str.extract('(.*?)(\d+)').rename(columns={0: 'cond', 1: 'comp'})], 
                axis=1)
tmp[['Val', 'comp']] = tmp[['Val', 'comp']].apply(pd.to_numeric)
#   Val cond  comp
#0    5    >    10
#1   15    >    10
#2   20   ==    20
#3   25   <=    25
#4   26   <=    25
#5   10   !=    10

# Aligns on row Index
df['Result'] = pd.concat([getattr(gp['Val'], d[idx])(gp['comp']) 
                          for idx, gp in tmp.groupby('cond')])
#   ID  Val  Cond  Result
#0   1    5   >10   False
#1   1   15   >10    True
#2   1   20  ==20    True
#3   1   25  <=25    True
#4   1   26  <=25   False
#5   1   10  !=10   False

Simple, but inefficient and dangerous, is to eval on each row, creating a string of your condition.简单但低效且危险的方法是对每一行进行eval ,创建一个条件字符串。 eval is dangerous as it can evaluate any code, so only use if you truly trust and know the data. eval很危险,因为它可以评估任何代码,因此只有在您真正信任并了解数据时才使用。

df['Result'] = df.apply(lambda x: eval(str(x.Val) + x.Cond), axis=1)
#    ID  Val  Cond  Result
#0   1    5   >10   False
#1   1   15   >10    True
#2   1   20  ==20    True
#3   1   25  <=25    True
#4   1   26  <=25   False
#5   1   10  !=10   False

You can also do something like this:你也可以这样做:

df["Result"] = [eval(x + y) for x, y in zip(df["Val"].astype(str), df["Cond"]]

Make the "Result" column by concatenating the strings df["Val"] and df["Cond"], then applying eval to that.通过连接字符串 df["Val"] 和 df["Cond"] 来创建“Result”列,然后对其应用 eval。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM