[英]Efficient way to update column value for subset of rows on Pandas DataFrame?
When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it? 使用Pandas更新行的特定子集的列的值时,最佳方法是什么?
Easy example: 简单的例子:
import pandas as pd
df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
'value' : pd.Series([1., 2., 3., 4.])})
Objective: update the value
column based on names length and the initial value of the value column itself. 目的:更新所述
value
基于名称的长度和值列本身的初始值的列。
The following line achieves the objective: 以下代码实现了目标:
df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000
However, this line filters the whole data frame two times, both in LHS and RHS. 但是,此行在LHS和RHS中两次对整个数据帧进行过滤。 I assume is not the most efficient way.
我认为这不是最有效的方法。 And it does not do it 'in place'.
它并没有做到“就地”。
Basically I'm looking for the pandas equivalent to R data.table ':=' operator: 基本上我正在寻找与R data.table':='运算符等效的熊猫:
df[nchar(name) == 4, value := value*1000]
And for other kind of operations such: 对于其他类型的操作,例如:
df[nchar(name) == 4, value := paste0("short_", as.character(value))]
Environment: Python 3.6
Pandas 0.22
环境:
Python 3.6
Pandas 0.22
Thanks in advance. 提前致谢。
You need loc
with *=
: 你需要
loc
与*=
:
df.loc[df.name.str.len() == 4, 'value'] *= 1000
print (df)
name value
0 Alex 1000.0
1 John 2000.0
2 Christopher 3.0
3 Dwayne 4.0
EDIT: 编辑:
More general solutions: 更一般的解决方案:
mask = df.name.str.len() == 4
df.loc[mask, 'value'] = df.loc[mask, 'value'] * 1000
Or: 要么:
df.update(df.loc[mask, 'value'] * 1000)
This may be what you require: 这可能是您需要的:
df.loc[df.name.str.len() == 4, 'value'] *= 1000
df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.