[英]Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas
[英]Create new column into dataframe based on values from other columns using apply function onto multiple columns
我正在使用 apply function 根据现有列 [TV_TIC 和 ERRORS] 值创建一个新列,即 ERROR_TV_TIC 到 dataframe 中。 我不确定我做错了什么。 在某些情况下它可以工作,而在另一种情况下它不会并抛出错误。
DataFrame:
ERRORS|TV_TIC
|2.02101E+41
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']|nan
['Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
工作时的代码:
def validate_tv_tic(trades):
tv_tiv_errors = list()
if pd.isnull(trades['TV_TIC']):
tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
if pd.notnull(trades['TV_TIC']) and len(trades['TV_TIC']) != 42:
tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)
不起作用时的代码:这里现在的条件是 2 列系列,我确保我传递的是“&”而不是“and”
def validate_tv_tic(trades):
tv_tiv_errors = list()
if pd.isnull(trades['ERRORS']) & pd.isnull(trades['TV_TIC']):
tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
if pd.isnull(trades['ERRORS']) & pd.notnull(trades['TV_TIC']) & len(trades['TV_TIC']) != 42:
tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)
我得到的错误:('具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()','发生在索引 3')
我的直觉是说 pd.isnull 是在某个地方引起问题,但不确定。
代码没有问题。 dataframe 中的数据存在问题。
列 ERRORS 是字符串列表,当 > 1 个项目作为列值存在时引发错误。 所以,我在第 3 行和第 4 行遇到错误
ERRORS
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']
['Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
找到根本原因后,我将列表更改为字符串,其中元素由非逗号元素分隔,这对我有用。
从
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
至
return ' & '.join(errors) if len(errors) > 0 else np.nan
这创建了我的 dataframe 列错误如下:
ERRORS
Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)
Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.