[英]dataframe in pandas with certain conditions
am trying to combine features of in a dataframe to derive a new columns in the dataframe 我试图结合数据框中的功能来导出数据框中的新列
I have this dataframe 我有这个数据帧
Id Author News_post Label
1 Jessica xxxxxxxxx 1
2 Adams xxxxxxxxx 1
3 Adams xxxxxxxxx 1
4 Mike xxxxxxxxx 0
5 James xxxxxxxxx 1
6 Mike xxxxxxxxx 1
7 Mike xxxxxxxxx 0
8 Paul xxxxxxxxx 0
9 Jessica xxxxxxxxx 0
10 Adams xxxxxxxxx 0
NB: where the Label
column have 1=TRUE
AND 0=FALSE
注意: Label
列的位置为1=TRUE
0=FALSE
Id Author Num_Post Num_True_Label Num_False_Label Mean
1 Adams 3 2 1 x
2 James 1 1 0 x
3 Jessica 2 1 1 x
4 Mike 2 0 1 x
5 Paul 1 0 0 x
This may solve a number of things you are trying to get from your issue: 这可能会解决您尝试从您的问题中获得的一些事项:
df = pd.read_clipboard() # just copied your dataframe
df = df.groupby('Author').describe()
Output: 输出:
Id Label
count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max
Author
Adams 3.0 5.000000 4.358899 2.0 2.5 3.0 6.5 10.0 3.0 0.666667 0.577350 0.0 0.50 1.0 1.00 1.0
James 1.0 5.000000 NaN 5.0 5.0 5.0 5.0 5.0 1.0 1.000000 NaN 1.0 1.00 1.0 1.00 1.0
Jessica 2.0 5.000000 5.656854 1.0 3.0 5.0 7.0 9.0 2.0 0.500000 0.707107 0.0 0.25 0.5 0.75 1.0
Mike 3.0 5.666667 1.527525 4.0 5.0 6.0 6.5 7.0 3.0 0.333333 0.577350 0.0 0.00 0.0 0.50 1.0
Paul 1.0 8.000000 NaN 8.0 8.0 8.0 8.0 8.0 1.0 0.000000 NaN 0.0 0.00 0.0 0.00 0.0
The following will get you what you need: 以下内容将为您提供所需:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'Author': ['Jessica', 'Adams', 'Adams', 'Mike', 'James', 'Mike', 'Mike', 'Paul', 'Jessica', 'Adams'], 'News_post': ['xxxxxxxxx', 'xxxxxxxxx', 'xxxxxxxxx', 'xxxxxxxxx', 'xxxxxxxxx', 'xxxxxxxxx', 'xxxxxxxxx', 'xx
...: xxxxxxx', 'xxxxxxxxx', 'xxxxxxxxx'], 'Label': [1,1,1,0,1,1,0,0,0,0]})
In [3]: num_true_label_df = df.groupby(by=['Author']).sum().rename(columns={'Label': 'Num_True_Label'}).reset_index()
In [4]: num_post_df = df.groupby(by=['Author']).count().rename(columns={'News_post': 'Num_Post'})[['Num_Post']].reset_index()
In [5]: df = pd.merge(num_post_df, num_true_label_df, how='left', on='Author').reset_index().rename(columns={'index': 'Id'})
In [6]: df['Id'] = df['Id'] + 1
In [7]: df['Num_False_Label'] = df['Num_Post'] - df['Num_True_Label']
In [8]: df
Out[7]:
Id Author Num_Post Num_True_Label Num_False_Label
0 1 Adams 3 2 1
1 2 James 1 1 0
2 3 Jessica 2 1 1
3 4 Mike 3 1 2
4 5 Paul 1 0 1
Please further specify what your Mean
column should represent. 请进一步说明您的Mean
列应代表什么。
Some resources which might be helpful: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html 一些可能有用的资源: https : //pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Using Pandas 0.25 with aggregation relabeling 使用Pandas 0.25和聚合重新标记
df.groupby('Author')['Label'].agg(Num_Post = 'size',
Num_True = 'sum',
Num_False = lambda x: x.eq(0).sum(),
Mean = 'mean')
Output: 输出:
Num_Post Num_True Num_False Mean
Author
Adams 3 2 1 0.666667
James 1 1 0 1.000000
Jessica 2 1 1 0.500000
Mike 3 1 2 0.333333
Paul 1 0 1 0.000000
Use transform and then remove the duplicates such that: 使用transform然后删除重复项,以便:
df['Num_Post']= df.groupby(['Author'])['Label'].transform('count')
df['Num_True_Label']= df.groupby(['Author'])['Label'].transform('sum')
df['Num_False_Label']= df['Num_Post']-df['Num_True_Label']
df['Mean']= df['Num_Post']/df['Num_True_Label']
Finally: drop dups and remove the News_post 最后:删除重复并删除News_post
df.drop(columns=['News_post'], inplace=True)
df.drop_duplicates(subset='Author', keep='first').sort_values(by=['Author'])
result: 结果:
Id Author Label Num_Post Num_True_Label Num_False_Label Mean
1 2 Adams 1 3 2 1 1.500000
4 5 James 1 1 1 0 1.000000
0 1 Jessica 1 2 1 1 2.000000
3 4 Mike 0 3 1 2 3.000000
7 8 Paul 0 1 0 1 inf
Note: change the mean for your definition. 注意:更改定义的平均值。
you could try : 你可以尝试:
agg_df = df.groupby('Author')['Label'].agg({"Num_post" : 'count', 'Num_True_Label' :
lambda x : x.eq(1).sum(),
'Num_False_Label':lambda x :
x.eq(0).sum(),
'Mean':'mean'}).reset_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.