[英]How to calculate the difference between grouped row in pandas
I have a dataset with the number of views per article.我有一个数据集,其中包含每篇文章的查看次数。
I'm trying to calculate the additional number of views per day, for each story, so I can graph it.我正在尝试计算每个故事每天的额外观看次数,以便绘制图表。
I manage to do it for one story only.我设法只为一个故事做到这一点。
storyviews = stats[['title', 'views']].sort_values(by=['title','views'])
storyviews = stats[stats["title"] == "Getting Started with TDD"]
storyviews = storyviews[["title","views"]].sort_values(by=['title','views'])
difference = storyviews.set_index('title').diff()
difference = difference.dropna(subset=['views'])
difference
and I got the correct result.我得到了正确的结果。
Is there a way to do it in one pass for all the stories?有没有办法一次性完成所有的故事?
DATASET数据集
y,m,d,mediumID,title,link,publication,mins,views,reads,readRatio,fans,pubDate,liveDate
2020,06,30,a1777d8bf7e,Swift — Filtering: A Real Example,https://levelup.gitconnected.com/swift-filtering-a-real-example-a1777d8bf7e,Level Up Coding,4 min read,35,13,37.142857142857146,1,2020-06-17,2020-06-26
2020,06,30,6f5fc68b0b43,SwiftUI 2: an overview,https://levelup.gitconnected.com/swiftui-2-an-overview-6f5fc68b0b43,Level Up Coding,3 min read,43,22,51.16279069767442,2,2020-06-24,2020-06-24
2020,07,01,a1777d8bf7e,Swift — Filtering: A Real Example,https://levelup.gitconnected.com/swift-filtering-a-real-example-a1777d8bf7e,Level Up Coding,4 min read,37,13,35.13513513513514,1,2020-06-17,2020-06-26
2020,07,01,6f5fc68b0b43,SwiftUI 2: an overview,https://levelup.gitconnected.com/swiftui-2-an-overview-6f5fc68b0b43,Level Up Coding,3 min read,57,29,50.87719298245614,10,2020-06-24,2020-06-24
2020,07,02,a1777d8bf7e,Swift — Filtering: A Real Example,https://levelup.gitconnected.com/swift-filtering-a-real-example-a1777d8bf7e,Level Up Coding,4 min read,37,13,35.13513513513514,1,2020-06-17,2020-06-26
2020,07,02,6f5fc68b0b43,SwiftUI 2: an overview,https://levelup.gitconnected.com/swiftui-2-an-overview-6f5fc68b0b43,Level Up Coding,3 min read,76,43,56.578947368421055,15,2020-06-24,2020-06-24
2020,07,03,a1777d8bf7e,Swift — Filtering: A Real Example,https://levelup.gitconnected.com/swift-filtering-a-real-example-a1777d8bf7e,Level Up Coding,4 min read,40,13,34.21052631578947,1,2020-06-17,2020-06-26
2020,07,03,6f5fc68b0b43,SwiftUI 2: an overview,https://levelup.gitconnected.com/swiftui-2-an-overview-6f5fc68b0b43,Level Up Coding,3 min read,152,70,46.05263157894737,20,2020-06-24,2020-06-24
Thanks, Nicolas谢谢,尼古拉斯
Could you give this a shot?你能试一试吗?
cols = ['title', 'views']
storyviews = stats[cols].sort_values(by=cols)
res = storyviews.set_index('title').groupby('title', sort=False).diff().dropna()
Output: Output:
views
title
SwiftUI 2: an overview 14.0
SwiftUI 2: an overview 19.0
SwiftUI 2: an overview 76.0
Swift — Filtering: A Real Example 2.0
Swift — Filtering: A Real Example 0.0
Swift — Filtering: A Real Example 3.0
For plotting the legend, title..., the you might want to ask another question.对于绘制图例、标题...,您可能想问另一个问题。 I don't have an answer.我没有答案。 To get you started on the plot, try this.为了让您开始使用 plot,试试这个。
res.reset_index().groupby('title', sort=False).plot()
So you're just trying to get a sum of all views per title?所以你只是想得到每个标题所有视图的总和? Here's one-way using dictionary conversion from a pandas DataFrame:这是从 pandas DataFrame 使用字典转换的单向方法:
...
dict = df.groupby('title').groups
for key in dict:
numpyarray_diff_between = np.diff(np.array(list(dict[key])))
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.