[英]sorting DataFrame based on multiple conditions in pandas
I have a DataFrame similar to the following我有一个类似于以下的 DataFrame
date mood score count avg abs
23/12/18 negative -50.893 137 -0.371 50.893
23/12/18 neutral 0.2193 10 0.0219 0.2193
23/12/18 positive 336.5098 673 0.5000 336.5098
24/12/18 positive 91.2414 232 -0.393 91.2414
24/12/18 neutral 0.063 14 0.0045 0.063
24/12/18 negative -649.697 1184 0.5487 649.697
25/12/18 negative -72.4142 8 -9.0517 72.4142
25/12/18 positive 0 0 0 0
25/12/18 neutral 323.0056 173 1.86708 323.0056
26/12/18 negative -12.0467 15 -.8031 12.0467
I want to apply the following conditions to this dataset.我想对这个数据集应用以下条件。
Con: if the absolute value(abs) score on a date is the greatest (of 3 moods), keep that date only together with its other attributes.
Con: No duplicate date is to be kept. So the dataset will be reduced quite a lot compared to its original size.
Expected Output预期产出
date mood_corrected score count avg abs
23/12/18 positive 336.5098 673 0.50001456 336.5098
24/12/18 negative 649.697 1184 0.54873057 649.697
25/12/18 neutral 323.0056 173 1.86708439 323.0056
26/12/18 negative -12.0467 15 -0.8031 12.0467
My code我的代码
import pandas as pd
df =pd.read_csv('file.csv')
new_df= df.sort_values('abs', ascending=False).drop_duplicates(['date','mood'])
While I get result that sort the dataset based on absolute value**(abs)**, I still have the full dataset.虽然我得到的结果是根据绝对值 **(abs)** 对数据集进行排序,但我仍然拥有完整的数据集。 It is not reduced.
它没有减少。 Any help is appreciated.
任何帮助表示赞赏。 Many thanks.
非常感谢。
Note : I looked at stackoverflow but have not found a quite similar problem.注意:我查看了 stackoverflow,但没有发现非常相似的问题。
以下将完成这项工作!
new_df = df.sort_values('abs', ascending=False).drop_duplicates(['date']).sort_values('date')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.