Dataframe with 3 columns:
FLAG CLASS CATEGORY
yes 'Sci' 'Alpha'
yes 'Sci' 'undefined'
yes 'math' 'Beta'
yes 'math' 'undefined'
yes 'eng' 'Gamma'
yes 'math' 'Beta'
yes 'eng' 'Gamma'
yes 'eng' 'Omega'
yes 'eng' 'Omega'
yes 'eng' 'undefined'
yes 'Geog' 'Lambda'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
I want to fill up the 'undefined' values in the column CATEGORY with the other category value (if any) that the class has. Eg The Science class will fill up its empty category with 'Alpha', The 'math' class will fill up its 'undefined' category with 'Beta'.
In the case there are 2 or more categories to consider, leave as is. Eg The english class 'eng' has two categories 'Gamma' and 'Omega', so the category 'undefined' for the class English will be left as 'undefined'
If all the categories for a class are 'undefined', leave as 'undefined'.
Result
FLAG CLASS CATEGORY
yes 'Sci' 'Alpha'
yes 'Sci' 'Alpha'
yes 'math' 'Beta'
yes 'math' 'Beta'
yes 'eng' 'Gamma'
yes 'math' 'Beta'
yes 'eng' 'Gamma'
yes 'eng' 'Gamma'
yes 'eng' 'Omega'
yes 'eng' 'Omega'
yes 'eng' 'undefined'
yes 'Geog' 'Lambda'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
yes 'Art' 'undefined'
IT NEEDS TO GENERALIZE. I HAVE MANY CLASSES IN THE CLASS COLUMN and cannot afford to encode 'Sci' or 'eng'.
I have been trying this with multiple np.wheres but had no luck.
I will using ffill
and bffil
within groupby
s=df.CATEGORY.mask(df.CATEGORY.eq('undefined'))
s2=s.groupby(df['CLASS']).transform('nunique')
df.loc[s2.eq(1)&s.isnull(),'CATEGORY']=s.groupby(df.CLASS).apply(lambda x : x.ffill().bfill())
df
Out[388]:
FLAG CLASS CATEGORY
0 yes Sci Alpha
1 yes Sci Alpha
2 yes math Beta
3 yes math Beta
4 yes eng Gamma
5 yes math Beta
6 yes eng Gamma
7 yes eng Omega
8 yes eng Omega
9 yes eng undefined
10 yes Geog Lambda
11 yes Art undefined
12 yes Art undefined
13 yes Art undefined
请尝试以下方法:
df['CATEGORY'] = df.replace('undefined', np.nan, regex=True).groupby('CLASS')['CATEGORY'].apply(lambda x: x.fillna(x.mode()[0]) if not x.isna().all() else x).replace(np.nan, "\\'undefined\\'")
Edit :
I add another solution using isin
to filter out on valid class
for updating both not undefined
and undefined
. Then, updating this exact slice of df
.
Steps :
Creating m
as the series of CLASS
has CATEGORY
as undifined
and unique not undefined
values. Using isin
to select qualified rows and where
to turn undefined
to NaN
. Finally, Groupby
by CLASS
on these row, ffill
, bfill
per group to fill NaN
and assign back to df
m = df.query('CATEGORY!="undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
df[df.CLASS.isin(m)] = df[df.CLASS.isin(m)].where(df!='undefined').groupby('CLASS').ffill().bfill()
This solution looks cleaner, but I don't know whether it is slower than original solution since using groupby
Original :
My solution constructs 'not undefined'
from 'undefined'
mapped by unique 'not undefined'
values:
m = df.query('CATEGORY != "undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
t = df.query('CATEGORY == "undefined"').CLASS.map(df.loc[m.index].set_index('CLASS').CATEGORY)
df['CATEGORY'].update(t)
Out[553]:
FLAG CLASS CATEGORY
0 yes Sci Alpha
1 yes Sci Alpha
2 yes math Beta
3 yes math Beta
4 yes eng Gamma
5 yes math Beta
6 yes eng Gamma
7 yes eng Omega
8 yes eng Omega
9 yes eng undefined
10 yes Geog Lambda
11 yes Art undefined
12 yes Art undefined
13 yes Art undefined
you can do by using boolian indesing
df[(df['CLASS']=='Sci'& df['CATEGORY']=='undefined','CATEGORY')]='Alpha'
df[(df['CLASS']=='math'& df['CATEGORY']=='undefined','CATEGORY')]='Beta'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.