简体   繁体   中英

Error when counting group by columns in Python-TypeError: only integer scalar arrays can be converted to a scalar index

I want to count the duplicate rows per hour.

My data frame:

 hour         index    name    
08:00:00      1442       x
08:45:00      3434       y
08:30:00      1442       x
08:00:00      1442       x
08:45:00      3434       y
08:00:00      1442       x

My code: I tried to group the data per hour and count. transform didn't help.

df_count= df.groupby('hour')[['index','name']].count()

This is the error:

TypeError: only integer scalar arrays can be converted to a scalar index

This is the output I want:

 hour         index    name   count  
08:00:00      1442       x       3
08:30:00      1442       x       1
08:45:00      3434       y       2

I'm not sure what's going on with your data. When I set one up like this:

df = pd.DataFrame({
    'hour': ['08:00:00', '08:45:00', '08:30:00', '08:00:00', '08:45:00', '08:00:00'],
    'index': [1442, 3434, 1442, 1442, 3434, 1442],
    'name': ['x', 'y', 'x', 'x', 'y', 'x'],
})

Then your code works fine (it doesn't do what you want, but it runs without issues):

>>> df.groupby('hour')[['index','name']].count()
          index  name
hour                 
08:00:00      3     3
08:30:00      1     1
08:45:00      2     2

In any case, once you fix your DataFrame content, the following should get the expected result:

>>> df.groupby(['hour', 'index', 'name']).size()
hour      index  name
08:00:00  1442   x       3
08:30:00  1442   x       1
08:45:00  3434   y       2

You can also add: .to_frame('count').reset_index() if you like.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM