Pandas：根據其他列的值創建一個新列（按行）

Question

我正在尋找基於幾列（

`TOTAL_HH_INCOME','HH_SIZE'

'Eligible Household Size', 'income_min1', 'income_max1', 'hh_size2','income_min2', 'income_max2', 'hh_size3', 'income_min3', 'income_max3', 'hh_size4', 'income_min4', 'income_max4', 'hh_size5', 'income_min5', 'income_max5', 'hh_size6', 'income_min6', 'income_max6'`

我希望比較我的 dataframe 中每一行的 HH Size 與每個 HH size# 變量和 TOTAL_HH_INCOME 與每個income_min 和income_max 變量。

我做了這個 function 作為嘗試

def eligibility (row):
    
    if df['HH_SIZE']== df['Eligible Household Size'] & df['TOTAL_HH_INCOME'] >= df['income_min1'] & df['TOTAL_HH_INCOME'] <=row['income_max1'] :
        return 'Eligible'
    
    if df['HH_SIZE']== df['hh_size2'] & df['TOTAL_HH_INCOME'] >= df['income_min2'] & df['TOTAL_HH_INCOME'] <=row['income_max2'] :
        return 'Eligible'
    
    if df['HH_SIZE']== df['hh_size3'] & df['TOTAL_HH_INCOME'] >= df['income_min3'] & df['TOTAL_HH_INCOME'] <=row['income_max3'] :
        return 'Eligible'

    if df['HH_SIZE']== df['hh_size4'] & df['TOTAL_HH_INCOME'] >= df['income_min4'] & df['TOTAL_HH_INCOME'] <=row['income_max4'] :
        return 'Eligible'

    if df['HH_SIZE']== df['hh_size5'] & df['TOTAL_HH_INCOME'] >= df['income_min5'] & df['TOTAL_HH_INCOME'] <=row['income_max5'] :
        return 'Eligible'

    if df['HH_SIZE']== df['hh_size6'] & df['TOTAL_HH_INCOME'] >= df['income_min6'] & df['TOTAL_HH_INCOME'] <=row['income_max6'] :
        return 'Eligible'
    
    return 'Ineligible'

如您所見，如果該行符合條件，我希望該行被標記為“合格”，否則應標記為“不合格”

我將此 function 應用到我的 df

df['Eligibility']= df.apply(eligibility, axis=1)

但是，我收到一個錯誤：

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

為什么？ 我的 function 不合格嗎？

編輯：

====================== DATAFRAME ===========================

Answer 1

問題似乎出在 if 語句中的比較運算符：因為您正在比較數據框的列，所以不僅有一個 True 值，而且還有與列中的項目一樣多的 True 值。

如果您希望所有元素都相同，請嘗試使用 a.all()。 請參考以下示例：

import pandas as pd
dict1 = {'name1': ['tom', 'pedro'], 'name2': ['tom', 'pedro'],
         'name3': ['tome', 'maria'], 'name4': ['maria', 'marta']}
df1 = pd.DataFrame(dict1)

# This produce a ValueError as the one you have
# if df1['name1'] == df1['name2']:
#     pass
# To see why this produce an error try printing the following:
print('This is a DataFrame of bool values an can not be handle by an if statement: \n',
      df1['name1'] == df1['name2'])

# This check if all the elements in 'name1' are the same as in 'name2'
if (df1['name1'] == df1['name2']).all():
    print('\nEligible')

Output：

This is a DataFrame of bool values an can not be handle by an if statement: 
 0    True
 1    True
dtype: bool

Eligible

Answer 2

你可以試試這個，使用df.to_records() ：

import re

#df.columns
s=['TOTAL_HH_INCOME','HH_SIZE','Eligible Household Size', 'income_min1', 'income_max1', 'hh_size2','income_min2', 'income_max2', 'hh_size3', 'income_min3', 'income_max3', 'hh_size4', 'income_min4', 'income_max4', 'hh_size5', 'income_min5', 'income_max5', 'hh_size6', 'income_min6', 'income_max6']


def func(row):
    totalincome=row[2]
    HHSIZE=row[3]
    indexhhsize=list(map(s.index,re.findall('(hh_size\d+)',''.join(s))))
    indexmax=list(map(s.index,re.findall('(income_max\d+)',''.join(s))))
    indexmin=list(map(s.index,re.findall('(income_min\d+)',''.join(s))))

    if(any(HHSIZE==row[i+1] for i in indexhhsize))\
    |(any(totalincome>=row[i+1] for i in indexmin))\
    |(any(totalincome<=row[i+1] for i in indexmax)):
        return 'Eligible'
    else:
        return 'Ineligible'
    
df['Eligibility']=[func(row) for row in df.to_records()]

Pandas：根據其他列的值創建一個新列（按行）

問題描述

2 個解決方案

解決方案1
1 2020-06-30 15:19:28

解決方案2
0 2020-06-30 15:52:11

Pandas：根據其他列的值創建一個新列（按行）

問題描述

2 個解決方案

解決方案1 1 2020-06-30 15:19:28

解決方案2 0 2020-06-30 15:52:11

解決方案1
1 2020-06-30 15:19:28

解決方案2
0 2020-06-30 15:52:11