[英]Replacing values greater than a number in pandas dataframe
I have a large dataframe which looks as:我有一个大数据框,它看起来像:
df1['A'].ix[1:3]
2017-01-01 02:00:00 [33, 34, 39]
2017-01-01 03:00:00 [3, 43, 9]
I want to replace each element greater than 9 with 11.我想用 11 替换大于 9 的每个元素。
So, the desired output for above example is:因此,上述示例所需的输出是:
df1['A'].ix[1:3]
2017-01-01 02:00:00 [11, 11, 11]
2017-01-01 03:00:00 [3, 11, 9]
Edit:编辑:
My actual dataframe has about 20,000 rows and each row has list of size 2000.我的实际数据框有大约 20,000 行,每行都有大小为 2000 的列表。
Is there a way to use numpy.minimum
function for each row?有没有办法为每一行使用numpy.minimum
函数? I assume that it will be faster than list comprehension
method?我认为它会比list comprehension
方法更快?
很简单: df[df > 9] = 11
You can use apply
with list comprehension
:您可以将apply
与list comprehension
一起使用:
df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
A
2017-01-01 02:00:00 [11, 11, 11]
2017-01-01 03:00:00 [3, 11, 9]
Faster solution is first convert to numpy array
and then use numpy.where
:更快的解决方案是首先转换为numpy array
,然后使用numpy.where
:
a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
[ 3 43 9]]
df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
A
2017-01-01 02:00:00 [11, 11, 11]
2017-01-01 03:00:00 [3, 11, 9]
You can use numpy indexing, accessed through the .values
function.您可以使用 numpy 索引,通过.values
函数访问。
df['col'].values[df['col'].values > x] = y
where you are replacing any value greater than x with the value of y.用 y 的值替换任何大于 x 的值。
So for the example in the question:因此,对于问题中的示例:
df1['A'].values[df1['A'] > 9] = 11
I know this is an old post, but pandas now supports DataFrame.where
directly.我知道这是一篇旧帖子,但DataFrame.where
现在直接支持DataFrame.where
。 In your example:在你的例子中:
df.where(df <= 9, 11, inplace=True)
Please note that pandas' where
is different than numpy.where
.请注意,pandas 的where
与numpy.where
不同。 In pandas, when the condition == True
, the current value in the dataframe is used.在 Pandas 中,当condition == True
,使用数据帧中的当前值。 When condition == False
, the other value is taken.当condition == False
,采用另一个值。
EDIT:编辑:
You can achieve the same for just a column with Series.where
:您可以使用Series.where
为一列实现相同的Series.where
:
df['A'].where(df['A'] <= 9, 11, inplace=True)
I came for a solution to replacing each element larger than h by 1 else 0, which has the simple solution:我来找一个解决方案,用 1 else 0 替换每个大于 h 的元素,它有一个简单的解决方案:
df = (df > h) * 1
(This does not solve the OP's question as all df <= h are replaced by 0.) (这不能解决 OP 的问题,因为所有 df <= h 都被 0 替换。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.