![](/img/trans.png)
[英]Replacing values based on multiple column values and conditions in pandas dataframe
[英]Replacing values in a pandas dataframe based on multiple conditions
我有一個基於此示例代碼的相當簡單的問題:
x1 = 10*np.random.randn(10,3)
df1 = pd.DataFrame(x1)
我正在尋找從df1
派生的單個 DataFrame ,其中正值替換為"up"
,負值替換為"down"
, 0
值(如果有)替換為"zero"
。 我曾嘗試使用.mask()
.where()
和.mask()
方法,但無法獲得所需的結果。
我看過其他帖子,它們一次根據多個條件進行過濾,但它們沒有顯示如何根據不同條件替換值。
df1.apply(np.sign).replace({-1: 'down', 1: 'up', 0: 'zero'})
輸出:
0 1 2
0 down up up
1 up down down
2 up down down
3 down down up
4 down down up
5 down up up
6 down up down
7 up down down
8 up up down
9 down up up
PS當然,用randn
精確為零是不太可能的
通常,您可以對values
使用np.select
並重新構建DataFrame
import pandas as pd
import numpy as np
df1 = pd.DataFrame(10*np.random.randn(10, 3))
df1.iloc[0, 0] = 0 # So we can check the == 0 condition
conds = [df1.values < 0 , df1.values > 0]
choices = ['down', 'up']
pd.DataFrame(np.select(conds, choices, default='zero'),
index=df1.index,
columns=df1.columns)
0 1 2
0 zero down up
1 up down up
2 up up up
3 down down down
4 up up up
5 up up up
6 up up down
7 up up down
8 down up down
9 up up down
對於多個條件,即。 (df['employrate'] <=55) & (df['employrate'] > 50)
用這個:
df['employrate'] = np.where(
(df['employrate'] <=55) & (df['employrate'] > 50) , 11, df['employrate']
)
或者你也可以這樣做
gm.loc[(gm['employrate'] <55) & (gm['employrate'] > 50),'employrate']=11
這里的非正式語法可以是:
<dataset>.loc[<filter1> & (<filter2>),'<variable>']='<value>'
out[108]:
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 11.000000 7.29
2 Algeria 11.000000 .69
3 Andorra nan 10.17
4 Angola 75.699997 5.57
因此我們在這里使用的語法是:
df['<column_name>'] = np.where((<filter 1> ) & (<filter 2>) , <new value>, df['column_name'])
對於單一條件,即。 ( 'employrate'] > 70 )
country employrate alcconsumption
0 Afghanistan 55.7000007629394 .03
1 Albania 51.4000015258789 7.29
2 Algeria 50.5 .69
3 Andorra 10.17
4 Angola 75.6999969482422 5.57
用這個:
df.loc[df['employrate'] > 70, 'employrate'] = 7
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 51.400002 7.29
2 Algeria 50.500000 .69
3 Andorra nan 10.17
4 Angola 7.000000 5.57
因此這里的語法是:
df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]
帶 OR 的 IF 條件
from pandas import DataFrame
names = {'First_name': ['Jon','Bill','Maria','Emma']}
df = DataFrame(names,columns=['First_name'])
df.loc[(df['First_name'] == 'Bill') | (df['First_name'] == 'Emma'), 'name_match'] = 'Match'
df.loc[(df['First_name'] != 'Bill') & (df['First_name'] != 'Emma'), 'name_match'] = 'Mismatch'
print (df)
輸出
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Match
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.