繁体   English   中英

根据来自其他列的值使用将 function 应用于多个列,在 dataframe 中创建新列

[英]Create new column into dataframe based on values from other columns using apply function onto multiple columns

我正在使用 apply function 根据现有列 [TV_TIC 和 ERRORS] 值创建一个新列,即 ERROR_TV_TIC 到 dataframe 中。 我不确定我做错了什么。 在某些情况下它可以工作,而在另一种情况下它不会并抛出错误。

DataFrame:

ERRORS|TV_TIC
|2.02101E+41
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']|nan
['Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan

工作时的代码:

def validate_tv_tic(trades):
    tv_tiv_errors = list() 
    if pd.isnull(trades['TV_TIC']):
        tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
    if pd.notnull(trades['TV_TIC']) and len(trades['TV_TIC']) != 42:
        tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
    return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)

不起作用时的代码:这里现在的条件是 2 列系列,我确保我传递的是“&”而不是“and”

def validate_tv_tic(trades):
    tv_tiv_errors = list()
    if pd.isnull(trades['ERRORS']) & pd.isnull(trades['TV_TIC']):
        tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
    if pd.isnull(trades['ERRORS']) & pd.notnull(trades['TV_TIC']) & len(trades['TV_TIC']) != 42:
        tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
    return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)

我得到的错误:('具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()','发生在索引 3')

使用“and”的错误描述错误截图 2

使用“&”时的错误说明错误截图 2

我的直觉是说 pd.isnull 是在某个地方引起问题,但不确定。

代码没有问题。 dataframe 中的数据存在问题。

列 ERRORS 是字符串列表,当 > 1 个项目作为列值存在时引发错误。 所以,我在第 3 行和第 4 行遇到错误

ERRORS

['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']
['Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']

找到根本原因后,我将列表更改为字符串,其中元素由非逗号元素分隔,这对我有用。

return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

return ' & '.join(errors) if len(errors) > 0 else np.nan

这创建了我的 dataframe 列错误如下:

ERRORS

Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)
Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM