如何在不迭代每一列的情况下有条件地将 dataframe 的一列中的值替换为另一列的值？

Question

given columns street, city, state, zip and columns new_street, new_city, new_state, new_zip.给定列街道、城市、state、zip 和列 new_street、new_city、new_state、new_zip。

where the 'new' columns sometimes have updated data but are sometimes null.其中“新”列有时会更新数据，但有时是 null。

I basically want to say我基本上想说

if df['new_street'] != null:
    df['street'] = df['new_street']
    df['city'] = df['new_city']
#etc

but i want it to check each row for the condition, not the series.但我希望它检查每一行的条件，而不是系列。 the way I am currently solving this problem is using apply我目前解决这个问题的方式是使用 apply

#this code block is super inefficient, there is probably
#a vectorized way to achieve this but i dont know it


def updated_street(row):
    if row['_merge'] == 'both':
        return row['Current Delivery Address']
    return row['street']
        
def updated_city(row):
    if row['_merge'] == 'both':
        return row['Current City']
    return row['city']
    
def updated_state(row):
    if row['_merge'] == 'both':
        return row['Current State']
    return row['state']
    
def updated_zip(row):
    if row['_merge'] == 'both':
        return row['Current ZIP+4']
    return row['zip']

#if we merged in new address info, replace the old with the new
df['street'] = df.apply(updated_street, axis=1)
df['city'] = df.apply(updated_city, axis=1)
df['state'] = df.apply(updated_state, axis=1)
df['zip'] = df.apply(updated_zip, axis=1)

but of course that rewrites every cell in each column.但当然，这会重写每列中的每个单元格。

Answer 1

I would loop through the columns and use np.where and have a formatted string beginning with new_ to deal with those dynamic new_ columns.我将遍历列并使用np.where并使用以new_开头的格式化字符串来处理那些动态new_列。 This solution is vectorized with np.where() with a total length of the loop being 2 for the two columns.该解决方案使用np.where()进行矢量化，两列的循环总长度为2 。

import numpy as np
for col in ['street', 'city']:
    df[col] = np.where(df['new_street'] != np.nan, df[f'new_{col}'], df[col])

output: No output as I couldn't test. output：没有 output，因为我无法测试。 Please provide input data:)请提供输入数据:)

如何在不迭代每一列的情况下有条件地将 dataframe 的一列中的值替换为另一列的值？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-07-23 22:26:58

如何在不迭代每一列的情况下有条件地将 dataframe 的一列中的值替换为另一列的值？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-07-23 22:26:58

解决方案1
2 已采纳 2020-07-23 22:26:58