I have a dataframe and I'm trying to generate this query:
" In which regions is present a bigger percent of infected old people(from 55), in which of young people(till 30) and in which adult people(31;54) "
Let's clarify any question, first one means --> get the regions where the number of people older than 55 is the biggest percent than the other ages, so it should give me a list of the regions. Analogically, the other two questions are the same but in different age ranges
My dataframe looks like this:
Unnamed: 0 state sex diag death status T.categ age
0 1 NSW M 10905 11081 D hs 35
1 2 NSW M 11029 11096 D hs 53
2 3 NSW M 9551 9983 D hs 42
3 4 NSW M 9577 9654 D haem 44
4 5 NSW M 10015 10290 D hs 39
... ... .. ... ... ... ... ...
2838 2839 Other M 11475 11504 A het 46
2839 2840 Other F 11420 11504 A het 34
2840 2841 Other M 11496 11504 A haem 49
2841 2842 Other M 11460 11504 A hs 55
2842 2843 Other M 11448 11504 A hs 37
[2843 rows x 8 columns]
and my approach to the solution is to generate a dataframe that looks like this:
(0, 30] (30, 54] (54, 200]
NSW 45 ...
VCI 234 ...
... 535
Other 56
With this, It would be easier to compare which state has the biggest amount by range.
So far my code can calculate the amount by ranges, but I don't know how to do it, including a group by of the regions... Here is my code and result:
data.groupby(pd.cut(data['age'], bins=[0, 30, 54, 200])).size()
[*]
(0, 30] 736
(30, 54] 1937
(54, 200] 166
Please feel free to recommend any other approach or help me out with this query please!
You can groupby state
as well:
data.groupby(['state', pd.cut(data['age'], bins=[0, 30, 54, 200])]).size()
The sample you included of your dataframe returned:
state 0 age
NSW (0, 30] 0
(30, 54] 5
(54, 200] 0
Other (0, 30] 0
(30, 54] 4
(54, 200] 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.