简体   繁体   English

在Pandas DataFrame上更新行子集的列值的有效方法?

[英]Efficient way to update column value for subset of rows on Pandas DataFrame?

When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it? 使用Pandas更新行的特定子集的列的值时,最佳方法是什么?

Easy example: 简单的例子:

import pandas as pd

df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
                   'value' : pd.Series([1., 2., 3., 4.])})

Objective: update the value column based on names length and the initial value of the value column itself. 目的:更新所述value基于名称的长度和值列本身的初始值的列。

The following line achieves the objective: 以下代码实现了目标:

df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000

However, this line filters the whole data frame two times, both in LHS and RHS. 但是,此行在LHS和RHS中两次对整个数据帧进行过滤。 I assume is not the most efficient way. 我认为这不是最有效的方法。 And it does not do it 'in place'. 它并没有做到“就地”。

Basically I'm looking for the pandas equivalent to R data.table ':=' operator: 基本上我正在寻找与R data.table':='运算符等效的熊猫:

df[nchar(name) == 4, value := value*1000]

And for other kind of operations such: 对于其他类型的操作,例如:

df[nchar(name) == 4, value := paste0("short_", as.character(value))]

Environment: Python 3.6 Pandas 0.22 环境: Python 3.6 Pandas 0.22

Thanks in advance. 提前致谢。

You need loc with *= : 你需要loc*=

df.loc[df.name.str.len() == 4, 'value'] *= 1000
print (df)
          name   value
0         Alex  1000.0
1         John  2000.0
2  Christopher     3.0
3       Dwayne     4.0

EDIT: 编辑:

More general solutions: 更一般的解决方案:

mask = df.name.str.len() == 4
df.loc[mask, 'value'] = df.loc[mask, 'value'] * 1000

Or: 要么:

df.update(df.loc[mask, 'value'] * 1000)

This may be what you require: 这可能是您需要的:

 df.loc[df.name.str.len() == 4, 'value'] *= 1000

 df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM