简体   繁体   English

熊猫用同一行中的其他列值替换数据框值

[英]pandas replace dataframe value by other columns value in the same row

I have this pandas dataframe 我有这个熊猫数据框

BU       |   DATA1      DATA2
01-TT        zone 01   noData
02-FF        noData    zone 02
....

and I need to replace the "noData" string by the corresponding row in the column BU, but only using the two first characters and adding "zone" word 并且我需要用BU列中的相应行替换“ noData”字符串,但只能使用前两个字符并添加“ zone”一词

BU       |   DATA1      DATA2
01-TT        zone 01    zone 01
02-FF        zone 02    zone 02
....

Thanks a lot 非常感谢

General solution: 通用解决方案:

In [135]:
cols = df.columns[df.columns.str.contains('DATA')]
df[cols] = df[cols].mask(df[cols].apply(lambda x: x.str.contains('noData')), 'zone ' + df['BU'].str[:2], axis=0)
df

Out[135]:
      BU    DATA1    DATA2
0  01-TT  zone 01  zone 01
1  02-FF  zone 02  zone 02

Here we first determine the cols that contain DATA , then we call mask just on these cols and using a boolean mask, replace just those rows that meet the condition and overwrite 在这里,我们首先确定包含DATA ,然后在这些列上调用mask并使用布尔掩码,仅替换那些满足条件的行并覆盖

You can use mask for replace True values by numpy array created by numpy.repeat : 您可以使用mask通过numpy.repeat创建的numpy array替换True值:

df = df.set_index('BU')

arr = np.repeat('zone ' + df.index.str[:2], len(df.columns)).values.reshape(df.shape)
print (arr)
[['zone 01' 'zone 01']
 ['zone 02' 'zone 02']]

df = df.mask(df == 'noData', arr)
print (df.reset_index())
      BU    DATA1    DATA2
0  01-TT  zone 01  zone 01
1  02-FF  zone 02  zone 02

Timings : 时间

#[20000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)

df1 = df.copy()

def jez(df):
    df = df.set_index('BU')
    df = df.mask(df == 'noData', np.repeat('zone ' + df.index.str[:2], len(df.columns)).values.reshape(df.shape))
    return (df.reset_index())

def ed(df):
    cols = df.columns[df.columns.str.contains('DATA')]
    df[cols] = df[cols].mask(df[cols].apply(lambda x: x.str.contains('noData')), 'zone ' + df['BU'].str[:2], axis=0)
    return df


print (jez(df))
print (ed(df1))

In [219]: %timeit (jez(df))
100 loops, best of 3: 14.2 ms per loop

In [220]: %timeit (ed(df1))
10 loops, best of 3: 46.3 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用其他列的计算值替换熊猫数据框中的 NaN - How to replace NaN in pandas dataframe with calculated value from other columns 如何在不更改 dataframe pandas 中的其他值的情况下替换行值? - How to replace row value without changing the other values in dataframe pandas? Pandas - 如果特定列的值为1,则将行中的其他列替换为0 - Pandas - Replace other columns in row with 0 if a specific column has a value of 1 根据同一pandas数据框中的其他列为列分配值 - Assign value to a column based of other columns from the same pandas dataframe pandas:如何检查列值是否在同一行的其他列中 - pandas: how to check if a column value is in other columns in the same row 在 DataFrame pandas 中的列中搜索和替换值 - Search and replace value in columns in DataFrame pandas 过滤熊猫数据框行并替换列中的值 - Filter Pandas dataframe row and replace value in column 将 Nan 替换为 pandas 中的上一行值 dataframe - Replace Nan with previous row value in pandas dataframe Pandas:如何使用其他 dataframe 的列值从 dataframe 返回具有相同行值的行? - Pandas: How to return the row from dataframe having same row values by using column value of other dataframe? Python DataFrame:在其他列中具有相同值的特定值之前查找上一行的值 - Python DataFrame : find previous row's value before a specific value with same value in other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM