I have a dataframe like below:
df = pd.DataFrame({'one' : pd.Series(['a', 'b', 'c', 'd','aa','bb',np.nan,'b','c',np.nan, np.nan] ),
'two' : pd.Series([10, 20, 30, 40,50,60,10,20,30,40,50])} )
In which first column is the variables, second column is the values. Variable value is constant, which will never change.
example 'a' value is 10 , whenever 'a' is presented corrsponding value will be10
Here some values missing in first column eg: NaN 10 which is a, NaN 40 which is d like wise dataframe contains 200 variables.
Values are not continuous variables, those are discrete and unsortable
In this case how can we impute missing values. Expected output should be:
Please help me on this.
Regards, Venkat.
I think in general it would be better to group and fill. We use DataFrame.groupby
:
df.groupby('two').apply(lambda x: x.ffill().bfill())
It can be done without using groupby but you have to sort by both columns:
df.sort_values(['two','one']).ffill().sort_index()
Below I show you how the method proposed in another answer may fail:
Here is an example:
df=pd.DataFrame({'one':['a',np.nan,'c','d',np.nan,'c','b','b',np.nan,'a'],'two':[10,20,30,40,10,30,20,20,30,10]})
print(df)
one two
0 a 10
1 NaN 20
2 c 30
3 d 40
4 NaN 10
5 c 30
6 b 20
7 b 20
8 NaN 30
9 a 10
df.sort_values(['two']).fillna(method='ffill').sort_index()
one two
0 a 10
1 a 20
2 c 30
3 d 40
4 a 10
5 c 30
6 b 20
7 b 20
8 c 30
9 a 10
As you can see the proposed method in another of the answers fails here( see row 1 ). This occurs because some NaN Value can be the first for a specific value of the column 'two' and is filled with the value of the upper group.
This don't happen if we group first:
df.groupby('two').apply(lambda x: x.ffill().bfill())
one two
0 a 10
1 b 20
2 c 30
3 d 40
4 a 10
5 c 30
6 b 20
7 b 20
8 c 30
9 a 10
As I said we can use DataFrame.sort_values
but we need to sort for both columns. I recommend you this method .
df.sort_values(['two','one']).ffill().sort_index()
one two
0 a 10
1 b 20
2 c 30
3 d 40
4 a 10
5 c 30
6 b 20
7 b 20
8 c 30
9 a 10
Here it is:
df.ffill(inplace=True)
output:
one two
0 a 10
1 b 20
2 c 30
3 d 40
4 aa 50
5 bb 60
6 a 10
7 b 20
8 c 30
9 d 40
10 aa 50
Try this:
df = df.sort_values(['two']).fillna(method='ffill').sort_index()
Which will give you
one two
0 a 10
1 b 20
2 c 30
3 d 40
4 aa 50
5 bb 60
6 a 10
7 b 20
8 c 30
9 d 40
10 aa 50
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.