[英]Pandas get three most common values for every column in groupby
I have a table like this:我有一张这样的桌子:
colour number letter
0 red one a
1 red two b
2 red two c
3 blue two a
4 blue two b
5 green one a
6 green two b
7 green three c
Which I made by doing:我做的:
df = pd.DataFrame([
('red', 'one', 'a'),
('red', 'two', 'b'),
('red', 'two', 'c'),
('blue', 'two', 'a'),
('blue', 'two', 'b'),
('green', 'one', 'a'),
('green', 'two', 'b'),
('green', 'three', 'c')
], columns=['colour', 'number', 'letter'])
I want to group the table by colour, and then for every remaining column get the three most common values.我想按颜色对表格进行分组,然后为剩余的每一列获取三个最常见的值。 If there aren't three unique values for a column, then the last could be repeated or it could be NaN
, either works.如果一列没有三个唯一值,则可以重复最后一个值,也可以是NaN
,两者都可以。 The output would look like: output 看起来像:
colour red blue green
number 1 two two one
2 one two two
3 one two three
letter 1 a a a
2 b b b
3 c b c
Or:或者:
colour red blue green
number 1 two two one
2 one NaN two
3 NaN NaN three
letter 1 a a a
2 b b b
3 c NaN c
I have already done this for a single column:我已经为单个列完成了此操作:
df.groupby('colour').number
.value_counts()
.groupby(level=0)
.head(3)
Output: Output:
colour number
blue two 2
green one 1
two 1
three 1
red two 2
one 1
However I would like to do it for all columns in my dataframe and get an output like the example.但是,我想对我的 dataframe 中的所有列执行此操作,并像示例一样获得 output。 I am completely stuck.我完全被困住了。
Try:尝试:
def fn(x):
return pd.Series(
(x.value_counts().index[:3].tolist() + [np.nan, np.nan])[:3],
index=range(1, 4),
)
out = pd.concat(
[
df.groupby("colour")[col].apply(fn).unstack(level=0).ffill()
for col in df.loc[:, "number":]
],
keys=df.loc[:, "number":],
)
print(out)
Prints:印刷:
colour blue green red
number 1 two three two
2 two two one
3 two one one
letter 1 b b b
2 a a a
3 a c c
Not pretty but I got it done:不漂亮,但我完成了:
def analyze_col(df, col, grpby):
top3: pd.Series = df.groupby(grpby)[col].value_counts().groupby(level=0).head(3)
gg = pd.DataFrame({
g[0]: g[1].index.get_level_values(1).to_series(index=range(1, len(g[1]) + 1)).reindex(range(1, 4))
for g in top3.groupby(level=0)
})
return pd.concat({col: gg}, names=[grpby])
def analyze_df(df, grpby):
return pd.concat([analyze_col(df, col, grpby) for col in df.columns if col != grpby])
print(analyze_df(df, 'colour'))
blue green red
colour
number 1 two one two
2 NaN three one
3 NaN two NaN
letter 1 a a a
2 b b b
3 NaN c c
k=df.groupby(['colour','letter']).number.value_counts(lambda x : x).groupby(level=0).head(3)
Output
colour letter number
blue a two 1.0
b two 1.0
green a one 1.0
b two 1.0
c three 1.0
red a one 1.0
b two 1.0
c two 1.0
Name: number, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.