替换熊猫数据框中大于数字的值

Question

I have a large dataframe which looks as:我有一个大数据框，它看起来像：

df1['A'].ix[1:3]
2017-01-01 02:00:00    [33, 34, 39]
2017-01-01 03:00:00    [3, 43, 9]

I want to replace each element greater than 9 with 11.我想用 11 替换大于 9 的每个元素。

So, the desired output for above example is:因此，上述示例所需的输出是：

df1['A'].ix[1:3]
2017-01-01 02:00:00    [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Edit:编辑：

My actual dataframe has about 20,000 rows and each row has list of size 2000.我的实际数据框有大约 20,000 行，每行都有大小为 2000 的列表。

Is there a way to use numpy.minimum function for each row?有没有办法为每一行使用numpy.minimum函数？ I assume that it will be faster than list comprehension method?我认为它会比list comprehension方法更快？

Answer 1

很简单： df[df > 9] = 11

Answer 2

You can use apply with list comprehension :您可以将apply与list comprehension一起使用：

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Faster solution is first convert to numpy array and then use numpy.where :更快的解决方案是首先转换为numpy array ，然后使用numpy.where ：

a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
 [ 3 43  9]]

df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Answer 3

You can use numpy indexing, accessed through the .values function.您可以使用 numpy 索引，通过.values函数访问。

df['col'].values[df['col'].values > x] = y

where you are replacing any value greater than x with the value of y.用 y 的值替换任何大于 x 的值。

So for the example in the question:因此，对于问题中的示例：

df1['A'].values[df1['A'] > 9] = 11

Answer 4

I know this is an old post, but pandas now supports DataFrame.where directly.我知道这是一篇旧帖子，但DataFrame.where现在直接支持DataFrame.where 。 In your example:在你的例子中：

df.where(df <= 9, 11, inplace=True)

Please note that pandas' where is different than numpy.where .请注意，pandas 的where与numpy.where不同。 In pandas, when the condition == True , the current value in the dataframe is used.在 Pandas 中，当condition == True ，使用数据帧中的当前值。 When condition == False , the other value is taken.当condition == False ，采用另一个值。

EDIT:编辑：

You can achieve the same for just a column with Series.where :您可以使用Series.where为一列实现相同的Series.where ：

df['A'].where(df['A'] <= 9, 11, inplace=True)

Answer 5

I came for a solution to replacing each element larger than h by 1 else 0, which has the simple solution:我来找一个解决方案，用 1 else 0 替换每个大于 h 的元素，它有一个简单的解决方案：

df = (df > h) * 1

(This does not solve the OP's question as all df <= h are replaced by 0.) （这不能解决 OP 的问题，因为所有 df <= h 都被 0 替换。）

替换熊猫数据框中大于数字的值

问题描述

5 个解决方案

解决方案1
37 2018-10-02 09:10:24

解决方案2
28 已采纳 2017-05-03 10:55:33

解决方案3
18 2019-01-29 17:06:54

解决方案4
11 2021-03-27 23:31:05

解决方案5
4 2019-09-18 08:07:09

替换熊猫数据框中大于数字的值

问题描述

5 个解决方案

解决方案1 37 2018-10-02 09:10:24

解决方案2 28 已采纳 2017-05-03 10:55:33

解决方案3 18 2019-01-29 17:06:54

解决方案4 11 2021-03-27 23:31:05

解决方案5 4 2019-09-18 08:07:09

解决方案1
37 2018-10-02 09:10:24

解决方案2
28 已采纳 2017-05-03 10:55:33

解决方案3
18 2019-01-29 17:06:54

解决方案4
11 2021-03-27 23:31:05

解决方案5
4 2019-09-18 08:07:09