I think this is a pretty simple question but I can't find another entry in which a similar case is solved.
I have a Pandas dataframe that looks like this:
group1 group2 meandiff lower upper reject
0 bacc dry_sed 2575.1697 2033.6713 3116.6681 True
1 bacc junc_hal -81.8513 -555.8132 392.1106 False
2 bacc other_trees -1.2333 -512.6246 510.1579 False
3 bacc phrag 613.2256 0.4309 1226.0204 True
4 bacc water -1074.4667 -1687.2614 -461.6719 True
5 bacc wet_sed -437.1854 -943.2217 68.8508 False
6 dry_sed junc_hal -2657.0210 -3068.3186 -2245.7234 True
7 dry_sed other_trees -2576.4030 -3030.3269 -2122.4792 True
8 dry_sed phrag -1961.9441 -2527.6677 -1396.2204 True
9 dry_sed water -3649.6364 -4215.3600 -3083.9127 True
10 dry_sed wet_sed -3012.3551 -3460.2374 -2564.4728 True
11 junc_hal other_trees 80.6179 -290.1464 451.3823 False
12 junc_hal phrag 695.0769 193.6165 1196.5373 True
13 junc_hal water -992.6154 -1494.0758 -491.1550 True
14 junc_hal wet_sed -355.3341 -718.6767 8.0084 False
15 other_trees phrag 614.4590 77.4825 1151.4354 True
16 other_trees water -1073.2333 -1610.2098 -536.2569 True
17 other_trees wet_sed -435.9521 -846.9253 -24.9788 True
18 phrag water -1687.6923 -2321.9951 -1053.3895 True
19 phrag wet_sed -1050.4111 -1582.2901 -518.5320 True
20 water wet_sed 637.2812 105.4022 1169.1603 True
I want to create sort of a contingency table between group1 and group2, but puting in each cell the value in the column Reject.
It should seem something like this:
bacc dry_sed junc_hal other_trees phrag water wet_sed
bacc NA 1 0 0 1 1 0
dry_sed 1 NA 1 1 1 1 1
junc_hal 0 1 NA 0 1 1 0
other_trees 0 1 0 NA 1 1 1
phrag 1 1 1 1 NA 1 1
water 1 1 1 1 1 NA 1
wet_sed 0 1 0 1 1 1 NA
The NA's are just as a reference, there could be any number.
Is there a direct way to summarize the data in this fashion? Before jumping to analyzing the tables using loops I would like to be sure there's no easy direct way of achieving this.
Thanks in advance.
You can pivot the dataframe.
df.pivot(index='group1', columns='group2', values='reject')
group2 dry_sed junc_hal other_trees phrag water wet_sed
group1
bacc True False False True True False
dry_sed None True True True True True
junc_hal None None False True True False
other_trees None None None True True True
phrag None None None None True True
water None None None None None True
Assuming your dataframe is called df
, you can do:
df['reject_flag'] = df['reject'].astype(int)
output = df.pivot_table(index='group1', columns='group2', values='reject_flag')
which gives you the following:
group2 dry_sed junc_hal other_trees phrag water wet_sed
group1
bacc 1.0 0.0 0.0 1.0 1.0 0.0
dry_sed NaN 1.0 1.0 1.0 1.0 1.0
junc_hal NaN NaN 0.0 1.0 1.0 0.0
other_trees NaN NaN NaN 1.0 1.0 1.0
phrag NaN NaN NaN NaN 1.0 1.0
water NaN NaN NaN NaN NaN 1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.