简体   繁体   中英

Divide a column depending on a row value in pandas

I am trying to do a calculation in Pandas that looks obvious, but after several tries I did not find how to do it correctly.

I have a dataframe that looks like this:

df = pd.DataFrame([["A", "a", 10.0],
                   ["A", "b", 12.0],
                   ["A", "c", 13.0],
                   ["B", "a", 5.0 ],
                   ["B", "b", 6.0 ],
                   ["B", "c", 7.0 ]])

The first column is a test name, the second column is a class, and third column gives a time. Each test is normally present in the table with the 3 classes.

This is the correct format to plot it like this:

sns.factorplot(x="2", y="0", hue="1", data=df,
               kind="bar")

So that for each test, I get a group of 3 bars, one for each class.

However I would like to change the dataframe so that each value in column 2 is not an absolute value, but a ratio compared to class "a".

So I would like to transform it to this:

df = pd.DataFrame([["A", "a", 1.0],
                   ["A", "b", 1.2],
                   ["A", "c", 1.3],
                   ["B", "a", 1.0],
                   ["B", "b", 1.2],
                   ["B", "c", 1.4]])

I am able to extract the series, change the index so that they match, do the computation, for example:

df_a = df[df[1] == "a"].set_index(0)
df_b = df[df[1] == "b"].set_index(0)
df_b["ratio_a"] = df_b[2] / df_a[2]

But this is certainly very inefficient, and I need to group it back to the format.

What is the correct way to do it?

You could use groupby/transform('first') to find the first value in each group:

import pandas as pd
df = pd.DataFrame([["A", "a", 10.0],
                   ["A", "b", 12.0],
                   ["A", "c", 13.0],
                   ["B", "b", 6.0 ],
                   ["B", "a", 5.0 ],
                   ["B", "c", 7.0 ]])
df = df.sort_values(by=[0,1])
df[2] /= df.groupby(0)[2].transform('first')

yields

   0  1    2
0  A  a  1.0
1  A  b  1.2
2  A  c  1.3
3  B  a  1.0
4  B  b  1.2
5  B  c  1.4

You can also do this with some index alignment.

df1 = df.set_index(['test', 'class'])
df1 / df1.xs('a', level='class')

But transform is better

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM