[英]Pandas - Replace values in column with other values from the same column
Dataframe with 3 columns: 具有3列的数据框 :
FLAG CLASS CATEGORY
yes 'Sci' 'Alpha'
yes 'Sci' 'undefined'
yes 'math' 'Beta'
yes 'math' 'undefined'
yes 'eng' 'Gamma'
yes 'math' 'Beta'
yes 'eng' 'Gamma'
yes 'eng' 'Omega'
yes 'eng' 'Omega'
yes 'eng' 'undefined'
yes 'Geog' 'Lambda'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
I want to fill up the 'undefined' values in the column CATEGORY with the other category value (if any) that the class has. 我想用该类具有的其他类别值(如果有的话)填充CATEGORY列中的“未定义”值。 Eg The Science class will fill up its empty category with 'Alpha', The 'math' class will fill up its 'undefined' category with 'Beta'.
例如,Science类将用“ Alpha”填充其空白类别,“ math”类将用“ Beta”填充其“未定义”类别。
In the case there are 2 or more categories to consider, leave as is. 如果要考虑2个或更多类别,请保持原样。 Eg The english class 'eng' has two categories 'Gamma' and 'Omega', so the category 'undefined' for the class English will be left as 'undefined'
例如,英语课“ eng”有两个类别“ Gamma”和“ Omega”,因此英语课的“ undefined”类别将保留为“ undefined”
If all the categories for a class are 'undefined', leave as 'undefined'. 如果某个类的所有类别均为“未定义”,则保留为“未定义”。
Result 结果
FLAG CLASS CATEGORY
yes 'Sci' 'Alpha'
yes 'Sci' 'Alpha'
yes 'math' 'Beta'
yes 'math' 'Beta'
yes 'eng' 'Gamma'
yes 'math' 'Beta'
yes 'eng' 'Gamma'
yes 'eng' 'Gamma'
yes 'eng' 'Omega'
yes 'eng' 'Omega'
yes 'eng' 'undefined'
yes 'Geog' 'Lambda'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
IT NEEDS TO GENERALIZE. 需要通用化。 I HAVE MANY CLASSES IN THE CLASS COLUMN and cannot afford to encode 'Sci' or 'eng'.
我在班级栏目中有很多班级,不能负担编码“ Sci”或“ eng”的编码。
I have been trying this with multiple np.wheres but had no luck. 我一直在尝试使用多个np.wheres,但是没有运气。
I will using ffill
and bffil
within groupby
我将在
groupby
使用ffill
和bffil
s=df.CATEGORY.mask(df.CATEGORY.eq('undefined'))
s2=s.groupby(df['CLASS']).transform('nunique')
df.loc[s2.eq(1)&s.isnull(),'CATEGORY']=s.groupby(df.CLASS).apply(lambda x : x.ffill().bfill())
df
Out[388]:
FLAG CLASS CATEGORY
0 yes Sci Alpha
1 yes Sci Alpha
2 yes math Beta
3 yes math Beta
4 yes eng Gamma
5 yes math Beta
6 yes eng Gamma
7 yes eng Omega
8 yes eng Omega
9 yes eng undefined
10 yes Geog Lambda
11 yes Art undefined
12 yes Art undefined
13 yes Art undefined
请尝试以下方法:
df['CATEGORY'] = df.replace('undefined', np.nan, regex=True).groupby('CLASS')['CATEGORY'].apply(lambda x: x.fillna(x.mode()[0]) if not x.isna().all() else x).replace(np.nan, "\\'undefined\\'")
Edit : 编辑 :
I add another solution using isin
to filter out on valid class
for updating both not undefined
and undefined
. 我添加了另一个使用
isin
解决方案来过滤掉有效class
以更新not undefined
和undefined
。 Then, updating this exact slice of df
. 然后,更新
df
确切切片。
Steps : 步骤 :
Creating m
as the series of CLASS
has CATEGORY
as undifined
and unique not undefined
values. 将
m
创建为CLASS
系列将CATEGORY
为undifined
且唯一not undefined
值。 Using isin
to select qualified rows and where
to turn undefined
to NaN
. 使用
isin
选择限定行并where
把undefined
到NaN
。 Finally, Groupby
by CLASS
on these row, ffill
, bfill
per group to fill NaN
and assign back to df
最后,在这些行上按
CLASS
进行Groupby
, ffill
, bfill
填充NaN
并分配回df
m = df.query('CATEGORY!="undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
df[df.CLASS.isin(m)] = df[df.CLASS.isin(m)].where(df!='undefined').groupby('CLASS').ffill().bfill()
This solution looks cleaner, but I don't know whether it is slower than original solution since using groupby
该解决方案看起来更干净,但是我不知道它是否比使用
groupby
慢
Original : 原件 :
My solution constructs 'not undefined'
from 'undefined'
mapped by unique 'not undefined'
values: 我的解决方案根据唯一的
'not undefined'
值映射的'undefined'
构造'not undefined'
:
m = df.query('CATEGORY != "undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
t = df.query('CATEGORY == "undefined"').CLASS.map(df.loc[m.index].set_index('CLASS').CATEGORY)
df['CATEGORY'].update(t)
Out[553]:
FLAG CLASS CATEGORY
0 yes Sci Alpha
1 yes Sci Alpha
2 yes math Beta
3 yes math Beta
4 yes eng Gamma
5 yes math Beta
6 yes eng Gamma
7 yes eng Omega
8 yes eng Omega
9 yes eng undefined
10 yes Geog Lambda
11 yes Art undefined
12 yes Art undefined
13 yes Art undefined
you can do by using boolian indesing 您可以使用布尔型indesing
df[(df['CLASS']=='Sci'& df['CATEGORY']=='undefined','CATEGORY')]='Alpha'
df[(df['CLASS']=='math'& df['CATEGORY']=='undefined','CATEGORY')]='Beta'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.