How to fillna to non-integer with mean for that group, and also replace all-NaN groups with 0

Question

I want to do a special fillna() on the following data set, as follows:

name,spend,received
A,1012,1200
A,?,1500
B,1300,?
B,2000,2500
B,?,?
C,?,?
C,?,?

In this dataset ? means any non-integer value like na or ???
A spend value of ? of A,B,C rows has to be replaced with the mean of that group, ie ? should be replaced with np.mean(A),np.mean(B),np.mean(C)
for C there are no other values so it has to be 0

We can't directly apply fillna(np.mean) in this case.

Answer 1

Here's a solution:

df = df.replace("?", np.NaN)
df.spend = pd.to_numeric(df.spend)
df.recieved = pd.to_numeric(df.recieved)
df.loc[df.spend.isna(), "spend"] = df.groupby("name").transform("mean").loc[df.spend.isna(), "spend"]
df["spend"] = df.spend.fillna(0)

Result:

  name   spend  recieved
0    A  1012.0    1200.0
1    A  1012.0    1500.0
2    B  1300.0       NaN
3    B  2000.0    2500.0
4    B  1650.0       NaN
5    C     0.0       NaN
6    C     0.0       NaN

Answer 2

Solution:

use pd.read_csv(..., na_values='?') to replace your NaNs at read-time
we'll adapt the basic answer on replacing NaNs within a group with its mean
your twist is that all-NaN groups will result in NaN mean, which should then itself be fillna() replaced with 0

So the key line is:

df['spend'] = df.groupby('name')['spend'].apply(lambda s: s.fillna(s.mean())).fillna(0)

Code:

import pandas as pd
from io import StringIO

dat = """name,spend,received
A,1012,1200
A,?,1500
B,1300,?
B,2000,2500
B,?,?
C,?,?
C,?,?"""

df = pd.read_csv(StringIO(dat), na_values='?')

  name   spend  received
0    A  1012.0    1200.0
1    A     NaN    1500.0
2    B  1300.0       NaN
3    B  2000.0    2500.0
4    B     NaN       NaN
5    C     NaN       NaN
6    C     NaN       NaN

df['spend'] = df.groupby('name')['spend'].apply(lambda s: s.fillna(s.mean())).fillna(0)

  name   spend  received
0    A  1012.0    1200.0
1    A  1012.0    1500.0
2    B  1300.0       NaN
3    B  2000.0    2500.0
4    B  1650.0       NaN
5    C     0.0       NaN
6    C     0.0       NaN

Answer 3

Assuming? could also be strings

import pandas as pd
import numpy as np

idx = ['A'] * 3 + ['B'] * 3 + ['C'] * 3
data = np.random.random_sample((9,2))

df = pd.DataFrame(index=idx, data=data[::], columns=['spend', 'recieved'])
df.index.name = 'name'

df.iloc[2, 1] = np.nan
df.iloc[1, 0] = 'ABCD'
df.iloc[4:6, 0] = np.nan

df

name    spend       recieved    
A       0.197366    0.467532
A       ABCD        0.256184
A       0.559562    NaN
B       0.59835     0.415382
B       NaN         0.163827
B       NaN         0.759888
C       0.897332    0.025344
C       0.782683    0.428465
C       0.201591    0.601339

Then

df = df.apply(pd.to_numeric, errors='coerce')

df['spend'] = df['spend'].groupby(level=0).transform(lambda x: x.fillna(x.mean()).fillna(0))
df['recieved'] = df['recieved'].groupby(level=0).transform(lambda x: x.fillna(x.mean()).fillna(0))

Which yields:

name spend      recieved        
A    0.197366   0.467532
A    0.378464   0.256184
A    0.559562   0.361858
B    0.598350   0.415382
B    0.598350   0.163827
B    0.598350   0.759888
C    0.897332   0.025344
C    0.782683   0.428465
C    0.201591   0.601339

How to fillna to non-integer with mean for that group, and also replace all-NaN groups with 0

Question

3 answers

solution1
0 ACCPTED 2020-06-10 05:47:14

solution2
0 2020-06-10 06:05:28

solution3
0 2020-06-10 06:18:41

How to fillna to non-integer with mean for that group, and also replace all-NaN groups with 0

Question

3 answers

solution1 0 ACCPTED 2020-06-10 05:47:14

solution2 0 2020-06-10 06:05:28

solution3 0 2020-06-10 06:18:41

solution1
0 ACCPTED 2020-06-10 05:47:14

solution2
0 2020-06-10 06:05:28

solution3
0 2020-06-10 06:18:41