熊貓-用同一列中的其他值替換列中的值

Question

具有3列的數據框：

FLAG CLASS   CATEGORY
yes 'Sci'   'Alpha'
yes 'Sci'   'undefined'
yes 'math'  'Beta'
yes 'math'  'undefined'
yes 'eng'   'Gamma'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'eng'   'Omega'
yes 'eng'   'Omega'
yes 'eng'   'undefined'
yes 'Geog'  'Lambda'
yes 'Art'   'undefined'
yes 'Art'   'undefined'
yes 'Art'   'undefined'

我想用該類具有的其他類別值（如果有的話）填充CATEGORY列中的“未定義”值。 例如，Science類將用“ Alpha”填充其空白類別，“ math”類將用“ Beta”填充其“未定義”類別。

如果要考慮2個或更多類別，請保持原樣。 例如，英語課“ eng”有兩個類別“ Gamma”和“ Omega”，因此英語課的“ undefined”類別將保留為“ undefined”

如果某個類的所有類別均為“未定義”，則保留為“未定義”。

結果

FLAG CLASS   CATEGORY
yes 'Sci'   'Alpha'
yes 'Sci'   'Alpha'
yes 'math'  'Beta'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'eng'   'Gamma'
yes 'eng'   'Omega'
yes 'eng'   'Omega'
yes 'eng'   'undefined'
yes 'Geog'  'Lambda'
yes 'Art'   'undefined'
yes 'Art'   'undefined'
yes 'Art'   'undefined'

需要通用化。 我在班級欄目中有很多班級，不能負擔編碼“ Sci”或“ eng”的編碼。

我一直在嘗試使用多個np.wheres，但是沒有運氣。

Answer 1

我將在groupby使用ffill和bffil

s=df.CATEGORY.mask(df.CATEGORY.eq('undefined'))
s2=s.groupby(df['CLASS']).transform('nunique')
df.loc[s2.eq(1)&s.isnull(),'CATEGORY']=s.groupby(df.CLASS).apply(lambda x : x.ffill().bfill())
df
Out[388]: 
   FLAG CLASS   CATEGORY
0   yes   Sci      Alpha
1   yes   Sci      Alpha
2   yes  math       Beta
3   yes  math       Beta
4   yes   eng      Gamma
5   yes  math       Beta
6   yes   eng      Gamma
7   yes   eng      Omega
8   yes   eng      Omega
9   yes   eng  undefined
10  yes  Geog     Lambda
11  yes   Art  undefined
12  yes   Art  undefined
13  yes   Art  undefined

Answer 2

請嘗試以下方法：

df['CATEGORY'] = df.replace('undefined', np.nan, regex=True).groupby('CLASS')['CATEGORY'].apply(lambda x: x.fillna(x.mode()[0]) if not x.isna().all() else x).replace(np.nan, "\\'undefined\\'")

Answer 3

編輯：
我添加了另一個使用isin解決方案來過濾掉有效class以更新not undefined和undefined 。 然后，更新df確切切片。

步驟：
將m創建為CLASS系列將CATEGORY為undifined且唯一not undefined值。 使用isin選擇限定行並where把undefined到NaN 。 最后，在這些行上按CLASS進行Groupby ， ffill ， bfill填充NaN並分配回df

m = df.query('CATEGORY!="undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
df[df.CLASS.isin(m)] = df[df.CLASS.isin(m)].where(df!='undefined').groupby('CLASS').ffill().bfill()

該解決方案看起來更干凈，但是我不知道它是否比使用groupby慢

原件：
我的解決方案根據唯一的'not undefined'值映射的'undefined'構造'not undefined' ：

m = df.query('CATEGORY != "undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
t = df.query('CATEGORY == "undefined"').CLASS.map(df.loc[m.index].set_index('CLASS').CATEGORY)
df['CATEGORY'].update(t)

Out[553]:
   FLAG CLASS   CATEGORY
0   yes   Sci      Alpha
1   yes   Sci      Alpha
2   yes  math       Beta
3   yes  math       Beta
4   yes   eng      Gamma
5   yes  math       Beta
6   yes   eng      Gamma
7   yes   eng      Omega
8   yes   eng      Omega
9   yes   eng  undefined
10  yes  Geog     Lambda
11  yes   Art  undefined
12  yes   Art  undefined
13  yes   Art  undefined

Answer 4

您可以使用布爾型indesing

df[(df['CLASS']=='Sci'& df['CATEGORY']=='undefined','CATEGORY')]='Alpha'
df[(df['CLASS']=='math'& df['CATEGORY']=='undefined','CATEGORY')]='Beta'

熊貓-用同一列中的其他值替換列中的值

問題描述

4 個解決方案

解決方案1
2 2019-05-09 18:55:53

解決方案2
1 2019-05-09 19:02:18

解決方案3
1 2019-05-09 23:39:53

解決方案4
0 2019-05-09 18:50:43

熊貓-用同一列中的其他值替換列中的值

問題描述

4 個解決方案

解決方案1 2 2019-05-09 18:55:53

解決方案2 1 2019-05-09 19:02:18

解決方案3 1 2019-05-09 23:39:53

解決方案4 0 2019-05-09 18:50:43

解決方案1
2 2019-05-09 18:55:53

解決方案2
1 2019-05-09 19:02:18

解決方案3
1 2019-05-09 23:39:53

解決方案4
0 2019-05-09 18:50:43