i have this dataframe in pandas:
df = pandas.DataFrame({
"n": ["a", "b", "c", "a", "b", "x"],
"t": [0, 0, 0, 1, 1, 1],
"v": [10,20,30,40,50,60]
})
how can it be filled with missing values such that every value of column t
has the same entries in column n
? that is every t
value should have entries for a, b, c, x
, recorded as NaN
if they are missing:
n t v
a 0 10
b 0 20
c 0 30
x NaN NaN
a 1 40
b 1 50
c NaN NaN
x 1 60
plan
'n'
. we'll use this to reindex
by f
to our groups within each group of column 't'
reindexing by idx
will ensure we get all elements of idx
represented for each group of unique 't'
reindex
in a bit idx = df.n.unique()
f = lambda x: x.reindex(idx)
df.set_index('n').groupby('t', group_keys=False).apply(f).reset_index()
n t v
0 a 0.0 10.0
1 b 0.0 20.0
2 c 0.0 30.0
3 x NaN NaN
4 a 1.0 40.0
5 b 1.0 50.0
6 c NaN NaN
7 x 1.0 60.0
You can use, if in df
are no NaN
before - create MultiIndex
and then reindex
, NaN
in t
are set by column v
:
cols = ["n", "t"]
df1 = df.set_index(cols)
mux = pd.MultiIndex.from_product(df1.index.levels, names=cols)
df1 = df1.reindex(mux).sort_index(level=[1,0]).reset_index()
df1['t'] = df1['t'].mask(df1['v'].isnull())
print (df1)
n t v
0 a 0.0 10.0
1 b 0.0 20.0
2 c 0.0 30.0
3 x NaN NaN
4 a 1.0 40.0
5 b 1.0 50.0
6 c NaN NaN
7 x 1.0 60.0
Another solution for adding NaN is unstack
, stack
method:
cols = ["n", "t"]
df1 = df.set_index(cols)['v'].unstack().stack(dropna=False)
df1 = df1.sort_index(level=[1,0]).reset_index(name='v')
df1['t'] = df1['t'].mask(df1['v'].isnull())
print (df1)
n t v
0 a 0.0 10.0
1 b 0.0 20.0
2 c 0.0 30.0
3 x NaN NaN
4 a 1.0 40.0
5 b 1.0 50.0
6 c NaN NaN
7 x 1.0 60.0
But if some NaN
values need groupby
with loc
by unique
values of n
column:
df = pd.DataFrame({"n": ["a", "b", "c", "a", "b", "x"],
"t": [0, 0, 0, 1, 1, 1],
"v": [10,20,30,40,50,np.nan]})
print (df)
n t v
0 a 0 10.0
1 b 0 20.0
2 c 0 30.0
3 a 1 40.0
4 b 1 50.0
5 x 1 NaN
df1 = df.set_index('n')
.groupby('t', group_keys=False)
.apply(lambda x: x.loc[df.n.unique()])
.reset_index()
print (df1)
n t v
0 a 0.0 10.0
1 b 0.0 20.0
2 c 0.0 30.0
3 x NaN NaN
4 a 1.0 40.0
5 b 1.0 50.0
6 c NaN NaN
7 x 1.0 NaN
df1 = df.groupby('t', group_keys=False)
.apply(lambda x: x.set_index('n').loc[df.n.unique()])
.reset_index()
print (df1)
n t v
0 a 0.0 10.0
1 b 0.0 20.0
2 c 0.0 30.0
3 x NaN NaN
4 a 1.0 40.0
5 b 1.0 50.0
6 c NaN NaN
7 x 1.0 NaN
From what I understand, you want every value in "n"
to be equally distributed among sub-groups grouped by "t"
. I'm also hoping that those "n"
cannot be duplicated in these sub-groups.
Considering these assumptions to be true, pd.pivot_table
seems to be a good option for this use case. Here, the values under "n"
would constitute the column names, "t"
would be the grouped index, and the contents of the DF
get filled by the values under "v"
. Later stack the DF
while preserving NaN
entries and fill it's corresponding cells in "t"
with .loc
accessor.
df1 = pd.pivot_table(df, "v", "t", "n", "first").stack(dropna=False).reset_index(name="v")
df1.loc[df1['v'].isnull(), "t"] = np.nan
Seems like you're building it wrong. Normally NaN are read in automatically or you specify them. You can manually put in NaN's by np.nan
if you have import numpy as np
at the top. Alternatively pandas stores numpy internally and you can get a Nan by pandas.np.nan
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.