简体   繁体   English

熊猫-用同一列中的其他值替换列中的值

[英]Pandas - Replace values in column with other values from the same column

Dataframe with 3 columns: 具有3列的数据

FLAG CLASS   CATEGORY
yes 'Sci'   'Alpha'
yes 'Sci'   'undefined'
yes 'math'  'Beta'
yes 'math'  'undefined'
yes 'eng'   'Gamma'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'eng'   'Omega'
yes 'eng'   'Omega'
yes 'eng'   'undefined'
yes 'Geog'  'Lambda'
yes 'Art'   'undefined'
yes 'Art'   'undefined'
yes 'Art'   'undefined'

I want to fill up the 'undefined' values in the column CATEGORY with the other category value (if any) that the class has. 我想用该类具有的其他类别值(如果有的话)填充CATEGORY列中的“未定义”值。 Eg The Science class will fill up its empty category with 'Alpha', The 'math' class will fill up its 'undefined' category with 'Beta'. 例如,Science类将用“ Alpha”填充其空白类别,“ math”类将用“ Beta”填充其“未定义”类别。

In the case there are 2 or more categories to consider, leave as is. 如果要考虑2个或更多类别,请保持原样。 Eg The english class 'eng' has two categories 'Gamma' and 'Omega', so the category 'undefined' for the class English will be left as 'undefined' 例如,英语课“ eng”有两个类别“ Gamma”和“ Omega”,因此英语课的“ undefined”类别将保留为“ undefined”

If all the categories for a class are 'undefined', leave as 'undefined'. 如果某个类的所有类别均为“未定义”,则保留为“未定义”。

Result 结果

FLAG CLASS   CATEGORY
yes 'Sci'   'Alpha'
yes 'Sci'   'Alpha'
yes 'math'  'Beta'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'eng'   'Gamma'
yes 'eng'   'Omega'
yes 'eng'   'Omega'
yes 'eng'   'undefined'
yes 'Geog'  'Lambda'
yes 'Art'   'undefined'
yes 'Art'   'undefined'
yes 'Art'   'undefined'

IT NEEDS TO GENERALIZE. 需要通用化。 I HAVE MANY CLASSES IN THE CLASS COLUMN and cannot afford to encode 'Sci' or 'eng'. 我在班级栏目中有很多班级,不能负担编码“ Sci”或“ eng”的编码。

I have been trying this with multiple np.wheres but had no luck. 我一直在尝试使用多个np.wheres,但是没有运气。

I will using ffill and bffil within groupby 我将在groupby使用ffillbffil

s=df.CATEGORY.mask(df.CATEGORY.eq('undefined'))
s2=s.groupby(df['CLASS']).transform('nunique')
df.loc[s2.eq(1)&s.isnull(),'CATEGORY']=s.groupby(df.CLASS).apply(lambda x : x.ffill().bfill())
df
Out[388]: 
   FLAG CLASS   CATEGORY
0   yes   Sci      Alpha
1   yes   Sci      Alpha
2   yes  math       Beta
3   yes  math       Beta
4   yes   eng      Gamma
5   yes  math       Beta
6   yes   eng      Gamma
7   yes   eng      Omega
8   yes   eng      Omega
9   yes   eng  undefined
10  yes  Geog     Lambda
11  yes   Art  undefined
12  yes   Art  undefined
13  yes   Art  undefined

请尝试以下方法:

df['CATEGORY'] = df.replace('undefined', np.nan, regex=True).groupby('CLASS')['CATEGORY'].apply(lambda x: x.fillna(x.mode()[0]) if not x.isna().all() else x).replace(np.nan, "\\'undefined\\'")

Edit : 编辑
I add another solution using isin to filter out on valid class for updating both not undefined and undefined . 我添加了另一个使用isin解决方案来过滤掉有效class以更新not undefinedundefined Then, updating this exact slice of df . 然后,更新df确切切片。

Steps : 步骤
Creating m as the series of CLASS has CATEGORY as undifined and unique not undefined values. m创建为CLASS系列将CATEGORYundifined且唯一not undefined值。 Using isin to select qualified rows and where to turn undefined to NaN . 使用isin选择限定行并whereundefinedNaN Finally, Groupby by CLASS on these row, ffill , bfill per group to fill NaN and assign back to df 最后,在这些行上按CLASS进行Groupbyffillbfill填充NaN并分配回df

m = df.query('CATEGORY!="undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
df[df.CLASS.isin(m)] = df[df.CLASS.isin(m)].where(df!='undefined').groupby('CLASS').ffill().bfill()

This solution looks cleaner, but I don't know whether it is slower than original solution since using groupby 该解决方案看起来更干净,但是我不知道它是否比使用groupby


Original : 原件
My solution constructs 'not undefined' from 'undefined' mapped by unique 'not undefined' values: 我的解决方案根据唯一的'not undefined'值映射的'undefined'构造'not undefined'

m = df.query('CATEGORY != "undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
t = df.query('CATEGORY == "undefined"').CLASS.map(df.loc[m.index].set_index('CLASS').CATEGORY)
df['CATEGORY'].update(t)

Out[553]:
   FLAG CLASS   CATEGORY
0   yes   Sci      Alpha
1   yes   Sci      Alpha
2   yes  math       Beta
3   yes  math       Beta
4   yes   eng      Gamma
5   yes  math       Beta
6   yes   eng      Gamma
7   yes   eng      Omega
8   yes   eng      Omega
9   yes   eng  undefined
10  yes  Geog     Lambda
11  yes   Art  undefined
12  yes   Art  undefined
13  yes   Art  undefined

you can do by using boolian indesing 您可以使用布尔型indesing

df[(df['CLASS']=='Sci'& df['CATEGORY']=='undefined','CATEGORY')]='Alpha'
df[(df['CLASS']=='math'& df['CATEGORY']=='undefined','CATEGORY')]='Beta'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM