熊猫用同一行中的其他列值替换数据框值

Question

I have this pandas dataframe 我有这个熊猫数据框

BU       |   DATA1      DATA2
01-TT        zone 01   noData
02-FF        noData    zone 02
....

and I need to replace the "noData" string by the corresponding row in the column BU, but only using the two first characters and adding "zone" word 并且我需要用BU列中的相应行替换“ noData”字符串，但只能使用前两个字符并添加“ zone”一词

BU       |   DATA1      DATA2
01-TT        zone 01    zone 01
02-FF        zone 02    zone 02
....

Thanks a lot 非常感谢

Answer 1

General solution: 通用解决方案：

In [135]:
cols = df.columns[df.columns.str.contains('DATA')]
df[cols] = df[cols].mask(df[cols].apply(lambda x: x.str.contains('noData')), 'zone ' + df['BU'].str[:2], axis=0)
df

Out[135]:
      BU    DATA1    DATA2
0  01-TT  zone 01  zone 01
1  02-FF  zone 02  zone 02

Here we first determine the cols that contain DATA , then we call mask just on these cols and using a boolean mask, replace just those rows that meet the condition and overwrite 在这里，我们首先确定包含DATA ，然后在这些列上调用mask并使用布尔掩码，仅替换那些满足条件的行并覆盖

Answer 2

You can use mask for replace True values by numpy array created by numpy.repeat : 您可以使用mask通过numpy.repeat创建的numpy array替换True值：

df = df.set_index('BU')

arr = np.repeat('zone ' + df.index.str[:2], len(df.columns)).values.reshape(df.shape)
print (arr)
[['zone 01' 'zone 01']
 ['zone 02' 'zone 02']]

df = df.mask(df == 'noData', arr)
print (df.reset_index())
      BU    DATA1    DATA2
0  01-TT  zone 01  zone 01
1  02-FF  zone 02  zone 02

Timings : 时间：

#[20000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)

df1 = df.copy()

def jez(df):
    df = df.set_index('BU')
    df = df.mask(df == 'noData', np.repeat('zone ' + df.index.str[:2], len(df.columns)).values.reshape(df.shape))
    return (df.reset_index())

def ed(df):
    cols = df.columns[df.columns.str.contains('DATA')]
    df[cols] = df[cols].mask(df[cols].apply(lambda x: x.str.contains('noData')), 'zone ' + df['BU'].str[:2], axis=0)
    return df


print (jez(df))
print (ed(df1))

In [219]: %timeit (jez(df))
100 loops, best of 3: 14.2 ms per loop

In [220]: %timeit (ed(df1))
10 loops, best of 3: 46.3 ms per loop

熊猫用同一行中的其他列值替换数据框值

问题描述

2 个解决方案

解决方案1
2 2017-02-07 15:12:42

解决方案2
1 已采纳 2017-02-07 14:59:53

熊猫用同一行中的其他列值替换数据框值

问题描述

2 个解决方案

解决方案1 2 2017-02-07 15:12:42

解决方案2 1 已采纳 2017-02-07 14:59:53

解决方案1
2 2017-02-07 15:12:42

解决方案2
1 已采纳 2017-02-07 14:59:53