简体   繁体   English

有条件地替换 pandas dataframe 中 arrays 列表中的值

[英]Conditionally replace values in list of arrays in a pandas dataframe

I would like to conditionally replace values in a column that contains a series of arrays.我想有条件地替换包含一系列 arrays 的列中的值。

Example dataset below: (my real dataset contains many more columns and rows)下面的示例数据集:(我的真实数据集包含更多的列和行)

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', '1 pumpkin']   A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['1 pumpkin']                          B
...     ...                                    ...

For example, if the condition is A and the row contains '1 pumpkin', then I would like to replace the value with XXX .例如,如果条件是A并且该行包含“1 个南瓜”,那么我想用XXX替换该值。 But if the condition is B and the row contains 1 pumpkin , then I would like to replace the value with YYY .但是如果条件是B并且该行包含1 pumpkin ,那么我想用YYY替换该值。

Desired output所需 output

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', 'XXX']         A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['YYY']                                B
...     ...                                    ...

The goal is, in fact, to replace all these values but 1 pumpkin is just one example.事实上,我们的目标是替换所有这些值,但1 pumpkin只是一个例子。 Importantly, I would like to maintain the array structure.重要的是,我想维护数组结构。 Thanks!谢谢!

Let us do explode then np.select让我们explode然后np.select

s = df.explode('lists')
cond = s['lists']=='1 pumpkin'
c1 = cond&s['condition'].eq('A')
c2 = cond&s['condition'].eq('B')
s['lists'] = np.select([c1,c2],['XXX','YYY'],default = s.lists.values )
df['lists'] = s.groupby(level=0)['lists'].agg(list)

You can define a function with the logic you want to apply to the Dataframe and then call df.apply(function) to pass this logic over the df您可以使用要应用于 Dataframe 的逻辑定义Dataframe ,然后调用df.apply(function)将此逻辑传递给df

def pumpkin(row):

    if '1 pumpkin' in row['lists']:
        data = row['lists'][:]
        if row['condition'] == 'A':
            data[data.index('1 pumpkin')] = 'XXX'
        elif row['condition'] == 'B':
            data[data.index('1 pumpkin')] = 'YYY'
        return data
    return row['lists']

df['lists'] = df.apply(pumpkin, axis=1)

Output Output

                      lists condition
0       [5 apples, 2 pears]         B
1  [3 apples, 3 pears, XXX]         A
2           [4 blueberries]         A
3                 [5 kiwis]         C
4                     [YYY]         B

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM