I want to count the duplicate rows per hour.
My data frame:
hour index name
08:00:00 1442 x
08:45:00 3434 y
08:30:00 1442 x
08:00:00 1442 x
08:45:00 3434 y
08:00:00 1442 x
My code: I tried to group the data per hour and count. transform didn't help.
df_count= df.groupby('hour')[['index','name']].count()
This is the error:
TypeError: only integer scalar arrays can be converted to a scalar index
This is the output I want:
hour index name count
08:00:00 1442 x 3
08:30:00 1442 x 1
08:45:00 3434 y 2
I'm not sure what's going on with your data. When I set one up like this:
df = pd.DataFrame({
'hour': ['08:00:00', '08:45:00', '08:30:00', '08:00:00', '08:45:00', '08:00:00'],
'index': [1442, 3434, 1442, 1442, 3434, 1442],
'name': ['x', 'y', 'x', 'x', 'y', 'x'],
})
Then your code works fine (it doesn't do what you want, but it runs without issues):
>>> df.groupby('hour')[['index','name']].count()
index name
hour
08:00:00 3 3
08:30:00 1 1
08:45:00 2 2
In any case, once you fix your DataFrame content, the following should get the expected result:
>>> df.groupby(['hour', 'index', 'name']).size()
hour index name
08:00:00 1442 x 3
08:30:00 1442 x 1
08:45:00 3434 y 2
You can also add: .to_frame('count').reset_index()
if you like.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.