简体   繁体   中英

dataframe group by column and cut it by ranges pandas python

I have a dataframe and I'm trying to generate this query:

" In which regions is present a bigger percent of infected old people(from 55), in which of young people(till 30) and in which adult people(31;54) "

Let's clarify any question, first one means --> get the regions where the number of people older than 55 is the biggest percent than the other ages, so it should give me a list of the regions. Analogically, the other two questions are the same but in different age ranges

My dataframe looks like this:

       Unnamed: 0  state sex   diag  death status T.categ  age
0              1    NSW   M  10905  11081      D      hs   35
1              2    NSW   M  11029  11096      D      hs   53
2              3    NSW   M   9551   9983      D      hs   42
3              4    NSW   M   9577   9654      D    haem   44
4              5    NSW   M  10015  10290      D      hs   39
          ...    ...  ..    ...    ...    ...     ...  ...
2838        2839  Other   M  11475  11504      A     het   46
2839        2840  Other   F  11420  11504      A     het   34
2840        2841  Other   M  11496  11504      A    haem   49
2841        2842  Other   M  11460  11504      A      hs   55
2842        2843  Other   M  11448  11504      A      hs   37
[2843 rows x 8 columns]

and my approach to the solution is to generate a dataframe that looks like this:

      (0, 30]       (30, 54]     (54, 200]
NSW     45                          ...
VCI     234            ... 
...                    535
Other                               56

With this, It would be easier to compare which state has the biggest amount by range.

So far my code can calculate the amount by ranges, but I don't know how to do it, including a group by of the regions... Here is my code and result:

data.groupby(pd.cut(data['age'], bins=[0, 30, 54, 200])).size()

[*] 
(0, 30]       736
(30, 54]     1937
(54, 200]     166

Please feel free to recommend any other approach or help me out with this query please!

You can groupby state as well:

data.groupby(['state', pd.cut(data['age'], bins=[0, 30, 54, 200])]).size()

The sample you included of your dataframe returned:

state               0  age      
NSW         (0, 30]      0
            (30, 54]     5
            (54, 200]    0
Other       (0, 30]      0
            (30, 54]     4
            (54, 200]    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM