使用 Loc 或 Apply 有条件地使用动态值设置多个列值

Question

我仍然是 Pandas 的新手，似乎无法结合这几个基本步骤。

目标：

我想根据条件执行有效的查找和替换多个列。

我有 dataframe df ，如果列lower_limit和upper_limit都是NaN ，则需要从另一个 dataframe查找索引查找。

我无法让合并/加入工作，因为索引名称之间存在差异（想想C_something ，来自DataFrame查找的 F_something ），为简单起见而省略了。

输入：

数据框：

import pandas as pd; import numpy as np
df = pd.DataFrame([['A', 3, 5],['B', 2, np.NaN],['C', np.NaN, np.NaN],['D', np.NaN, np.NaN]])
df.columns = ['Name','lower_limit','upper_limit']
df = df.set_index('Name')

lookup = pd.DataFrame([['C_Male', 4, 6],['C_Female', 5, 7],['E_Male', 2, 3],['E_Female', 3, 4]])
lookup.columns = ['Name', 'lower', 'upper']
lookup = lookup.set_index('Name')

# index: Name + index_modifier is the lookup index of interest for example
index_modifier = '_Male'

DataFrames 可视化：

# df                                  # lookup
      lower_limit  upper_limit                  lower  upper
Name                                  Name              
A             3.0          5.0        C_Male        4      6
B             2.0          NaN        C_Female      5      7
C             NaN          NaN        E_Male        2      3
D             NaN          NaN        E_Female      3      4

预期 output：

# df
      lower_limit  upper_limit
Name                                     
A             3.0          5.0
B             2.0          NaN  #<-- Does not meet conditional
C             4.0          6.0  #<-- Looked-up with index_modifier and changed
D             NaN          NaN  #<-- Looked-up with index_modifier and left unchanged

破代码：

我尝试使用df.loc() 文档和这个答案来屏蔽和设置值，但似乎无法根据该行的索引获得唯一值。

使用 df.loc 屏蔽和设置

# error: need get index of each row only
df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull()), ['lower_limit','upper_limit'] ] = lookup.loc[df.index + index_modifier]

使用 df.loc 掩码，然后设置

ix_of_interest = df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull())].index

# only keep index values that are in DataFrame 'lookup'
ix_of_interest = [ix for ix in ix_of_interest if ((ix + index_modifier) in lookup.index)]
lookup_ix = [ix + index_modifier for ix in lookup_ix]

# error: Not changing values. I think there is a mismatch of bracket depths for one
df.loc[ix_of_interest, ['lower_limit','upper_limit'] ] = lookup.loc[lookup_ix]

我也尝试使用 df.apply() 来设置值。 看到这个问题。

def do_lookup(row):
    # error:'numpy.float64' object has no attribute 'is_null'
    if row.lower_limit.isnull() and row.upper_limit.isnull():
        if (row.name + index_modifier) in lookup.index:
            return lookup.loc[row.name + index_modifier]

df['lower_limit', 'upper_limit'] = df.apply(do_lookup, axis=1)

或lambda

df['lower_limit', 'upper_limit'] = df.apply(lambda x: lookup.loc[x.name + index_modifier].to_list()
        # isnull() or isnan() would be better
        if ((x.lower_limit == np.NaN) and (x.upper_limit == np.NaN)) 
        # else may not be needed here
        else [np.NaN, np.NaN], 
    axis=1)

这似乎应该是一系列简单的步骤，但我无法让它们正常工作。 任何见解都将不胜感激 - 我的橡皮鸭很累而且很困惑。

Answer 1

您可以将Series.fillna与DataFrame.add_suffix一起使用：

index_modifier = '_Male'

init_index=df.index
df=df.T.add_suffix(index_modifier).T
df['lower_limit'].fillna(lookup['lower'],inplace=True)
df['upper_limit'].fillna(lookup['upper'],inplace=True)
df.index=init_index
print(df)


   lower_limit  upper_limit
A          3.0          5.0
B          2.0          NaN
C          4.0          6.0
D          NaN          NaN

使用 Loc 或 Apply 有条件地使用动态值设置多个列值

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-09-30 22:31:06

使用 Loc 或 Apply 有条件地使用动态值设置多个列值

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-09-30 22:31:06

解决方案1
0 已采纳 2019-09-30 22:31:06