Say I've got the dataframe:
Code Value
1 X 135
2 D 298
3 F 301
4 G 12
5 D 203
6 X 212
7 D 401
8 D 125
I want to create a new column in this dataframe which calculates the mean for all the rows in the dataframe where the value in the 'Code' column is the respective value in each row.
For instance, in row 1, the 'Mean' column would find the mean of all rows where Code is 'X'
You can use pd.Series.map()
this way:
df['Code_mean'] = df.Code.map(df.groupby(['Code']).Value.mean())
>>> df
Out[]:
Code Value Code_mean
1 X 135 173.50
2 D 298 256.75
3 F 301 301.00
4 G 12 12.00
5 D 203 256.75
6 X 212 173.50
7 D 401 256.75
8 D 125 256.75
This seems to be faster than transform
approach.
EDIT: benchmark to answer comments
import pandas as pd
from string import ascii_letters
df = pd.DataFrame(columns=['Code', 'Value'])
df.Code = [ascii_letters[26:][i] for i in np.random.randint(0, 26, 10000)]
df.Value = np.random.randint(0, 1024, 10000)
>>> %%timeit
... df['Code_mean'] = df.Code.map(df.groupby(['Code']).Value.mean())
1000 loops, best of 3: 1.45 ms per loop
# Reinit df before next timeit
>>> %%timeit
... df.assign(Code_mean=df.groupby('Code').transform('mean'))
100 loops, best of 3: 2.31 ms per loop
But after testing results does go in favour of transform
for larger dataframes (10^6 rows)
import pandas as pd
from string import ascii_letters
df = pd.DataFrame(columns=['Code', 'Value'])
df.Code = [ascii_letters[26:][i] for i in np.random.randint(0, 26, 1000000)]
df.Value = np.random.randint(0, 1024, 1000000)
>>> %%timeit
... df['Code_mean'] = df.Code.map(df.groupby(['Code']).Value.mean())
10 loops, best of 3: 95.2 ms per loop
# Reinit df before next timeit
>>> %%timeit
... df.assign(Code_mean=df.groupby('Code').transform('mean'))
10 loops, best of 3: 68.2 ms per loop
This is a good application for the transform
method after grouping by the codes.
>>> df['Group_means'] = df.groupby('Code').transform('mean')
>>> df
Code Value Group_means
0 X 135 173.50
1 D 298 256.75
2 F 301 301.00
3 G 12 12.00
4 D 203 256.75
5 X 212 173.50
6 D 401 256.75
7 D 125 256.75
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.