使用 pandas 替换多列中的值的优雅而有效的方法

Question

I have a dataframe like as shown below我有一个 dataframe 如下图所示

f = pd.DataFrame({'person_id': [101,101,101,201,201,201,203],
                  'test_id':[123,123,124,321,321,321,456],
                 'los_24':[0.3,0.7,0.6,1.01,2,1,2],
                 'los_48':[1,0.2,0.4,0.7,11,2,3],
                 'in_24':[21,24,0.3,2.3,0.8,23,1.001],
                 'in_48':[11.3,202.0,0.2,0.3,41.0,47,2],
                 'test':['A','B','C','D','E','F','G']})

I would like to replace all values less than 1 with value 1 under columns like los_24,los_48,in_24,in_48我想all values less than 1 with value 1 under columns like los_24,los_48,in_24,in_48

I tried the below我尝试了以下

f['los_24'] = np.where((f.los_24 < 1.0),1,f.los_24)
f['los_48'] = np.where((f.los_48 < 1.0),1,f.los_48)
f['in_24'] = np.where((f.in_24 < 1.0),1,f.in_24)
f['in_48'] = np.where((f.in_48 < 1.0),1,f.in_48)

But you can see am writing the same line of code multiple times with different column names.但是您可以看到我用不同的列名多次编写同一行代码。

In real data, I have more than 10 columns to replace values.在实际数据中，我有超过 10 列来替换值。 So, Is there any other efficient and elegant way to write this?那么，有没有其他有效和优雅的方式来写这个？

I expect my output to be like as shown below我希望我的 output 如下所示

Answer 1

You can clip :您可以clip ：

cols = ["los_24", "los_48", "in_24", "in_48"]

f[cols] = f[cols].clip(lower=1)

to get要得到

   person_id  test_id  los_24  los_48   in_24  in_48 test
0        101      123    1.00     1.0  21.000   11.3    A
1        101      123    1.00     1.0  24.000  202.0    B
2        101      124    1.00     1.0   1.000    1.0    C
3        201      321    1.01     1.0   2.300    1.0    D
4        201      321    2.00    11.0   1.000   41.0    E
5        201      321    1.00     2.0  23.000   47.0    F
6        203      456    2.00     3.0   1.001    2.0    G

Answer 2

You can select all columns for processing in list and only once call function numpy.where with selected columns:您可以 select 列表中的所有列进行处理，并且只调用一次 function numpy.where与选定的列：

cols = ['los_24','los_48','in_24','in_48']

f[cols] = np.where((f[cols] < 1.0),1,f[cols])

Or with DataFrame.mask :或使用DataFrame.mask ：

f[cols] = f[cols].mask((f[cols] < 1.0),1)

   person_id  test_id  los_24  los_48   in_24  in_48 test
0        101      123    1.00     1.0  21.000   11.3    A
1        101      123    1.00     1.0  24.000  202.0    B
2        101      124    1.00     1.0   1.000    1.0    C
3        201      321    1.01     1.0   2.300    1.0    D
4        201      321    2.00    11.0   1.000   41.0    E
5        201      321    1.00     2.0  23.000   47.0    F
6        203      456    2.00     3.0   1.001    2.0    G

Answer 3

Wow, there are so many ways to skin the cat.. You could also use the lambda function:哇，给猫剥皮的方法有很多。你也可以使用lambda function：

cols = ['los_24','los_48','in_24','in_48']
for col in cols:
    f[col] = f[col].apply(lambda x: 1 if x<1 else x)

Same output:-)相同的 output :-)

使用 pandas 替换多列中的值的优雅而有效的方法

问题描述

3 个解决方案

解决方案1
8 已采纳 2021-05-25 12:12:25

解决方案2
3 2021-05-25 12:10:38

解决方案3
1 2021-05-25 12:21:54

使用 pandas 替换多列中的值的优雅而有效的方法

问题描述

3 个解决方案

解决方案1 8 已采纳 2021-05-25 12:12:25

解决方案2 3 2021-05-25 12:10:38

解决方案3 1 2021-05-25 12:21:54

解决方案1
8 已采纳 2021-05-25 12:12:25

解决方案2
3 2021-05-25 12:10:38

解决方案3
1 2021-05-25 12:21:54