In my data, I need to group by columns X,Y,Z and fill out the result code column. The values will be filled from code column based on max value of either area or new_area column.
So for first group, code C has maximum area. In that case, all rows for that group should be C. For the second group, since the max area is same, so checking the new_area column, the result should be code B.
I need to have these results in a separate column along with other columns as well.
The table in the pic will help clarify.
This is a simple case of sorting then taking first
df = pd.read_csv(io.StringIO("""X,Y,Z,code,area,new_area,result_code
222 North St,Seattle,WA,A,200,600,C
222 North St,Seattle,WA,B,300,700,C
222 North St,Seattle,WA,C,400,750,C
222 North St,Seattle,WA,D,300,600,C
115 John St,Chicago,IL,A,200,250,B
115 John St,Chicago,IL,B,200,300,B
115 John St,Chicago,IL,C,50,100,B"""))
df = (df.sort_values(["X","Y","Z","area","new_area"], ascending=[True,True,True,False,False])
.assign(result_code=lambda dfa: dfa.groupby(["X","Y","Z"])["code"].transform("first"))
.sort_index()
)
df = (df.sort_values(["X","Y","Z","area","new_area"], ascending=[True,True,True,False,False])
.assign(result_code=lambda dfa: dfa.groupby(["X","Y","Z"])["code"].transform("first"))
.sort_index()
)
X | Y | Z | code | area | new_area | result_code | |
---|---|---|---|---|---|---|---|
0 | 222 North St | Seattle | WA | A | 200 | 600 | C |
1 | 222 North St | Seattle | WA | B | 300 | 700 | C |
2 | 222 North St | Seattle | WA | C | 400 | 750 | C |
3 | 222 North St | Seattle | WA | D | 300 | 600 | C |
4 | 115 John St | Chicago | IL | A | 200 | 250 | B |
5 | 115 John St | Chicago | IL | B | 200 | 300 | B |
6 | 115 John St | Chicago | IL | C | 50 | 100 | B |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.