简体   繁体   中英

how to fill missing values based on column in pandas?

i have this dataframe in pandas:

df = pandas.DataFrame({
        "n": ["a", "b", "c", "a", "b", "x"],
        "t": [0, 0, 0, 1, 1, 1],
        "v": [10,20,30,40,50,60]
    })

how can it be filled with missing values such that every value of column t has the same entries in column n ? that is every t value should have entries for a, b, c, x , recorded as NaN if they are missing:

   n  t   v
   a  0  10
   b  0  20
   c  0  30
   x  NaN NaN
   a  1  40
   b  1  50
   c  NaN NaN
   x  1  60

plan

  • get unique values of column 'n' . we'll use this to reindex by
  • we'll apply f to our groups within each group of column 't' reindexing by idx will ensure we get all elements of idx represented for each group of unique 't'
  • we set the index so that we can reindex in a bit

idx = df.n.unique()
f = lambda x: x.reindex(idx)
df.set_index('n').groupby('t', group_keys=False).apply(f).reset_index()

   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0  60.0

You can use, if in df are no NaN before - create MultiIndex and then reindex , NaN in t are set by column v :

cols = ["n", "t"]
df1 = df.set_index(cols)
mux = pd.MultiIndex.from_product(df1.index.levels, names=cols)
df1 = df1.reindex(mux).sort_index(level=[1,0]).reset_index()
df1['t'] = df1['t'].mask(df1['v'].isnull())
print (df1)
   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0  60.0

Another solution for adding NaN is unstack , stack method:

cols = ["n", "t"]
df1 = df.set_index(cols)['v'].unstack().stack(dropna=False)
df1 = df1.sort_index(level=[1,0]).reset_index(name='v')
df1['t'] = df1['t'].mask(df1['v'].isnull())
print (df1)
    n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0  60.0

But if some NaN values need groupby with loc by unique values of n column:

df = pd.DataFrame({"n": ["a", "b", "c", "a", "b", "x"], 
                       "t": [0, 0, 0, 1, 1, 1], 
                       "v": [10,20,30,40,50,np.nan]})
print (df)
   n  t     v
0  a  0  10.0
1  b  0  20.0
2  c  0  30.0
3  a  1  40.0
4  b  1  50.0
5  x  1   NaN

df1 = df.set_index('n')
        .groupby('t', group_keys=False)
        .apply(lambda x: x.loc[df.n.unique()])
        .reset_index()

print (df1)
   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0   NaN   

df1 = df.groupby('t', group_keys=False)
        .apply(lambda x: x.set_index('n').loc[df.n.unique()])
        .reset_index()
print (df1)
   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0   NaN

From what I understand, you want every value in "n" to be equally distributed among sub-groups grouped by "t" . I'm also hoping that those "n" cannot be duplicated in these sub-groups.

Considering these assumptions to be true, pd.pivot_table seems to be a good option for this use case. Here, the values under "n" would constitute the column names, "t" would be the grouped index, and the contents of the DF get filled by the values under "v" . Later stack the DF while preserving NaN entries and fill it's corresponding cells in "t" with .loc accessor.

df1 = pd.pivot_table(df, "v", "t", "n", "first").stack(dropna=False).reset_index(name="v")
df1.loc[df1['v'].isnull(), "t"] = np.nan

在此输入图像描述

Seems like you're building it wrong. Normally NaN are read in automatically or you specify them. You can manually put in NaN's by np.nan if you have import numpy as np at the top. Alternatively pandas stores numpy internally and you can get a Nan by pandas.np.nan

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM