Pandas - Group Rows based on a column and replace NaN with non-null values

Question

I'm trying to create some aggregations with strings on my dataframe, based on a target "group-by" column.

Imagine that I have the following dataframe with 4 columns:

I want to group all the rows based on column "Col1" and in the case o NaN group with the value that is not null.

The desired output is like this:

I also try to use a normal:

import pandas as pd
from tabulate import tabulate

df = pd.DataFrame({'Col1': ['A', 'B', 'A'],
                   'Col2': ['X', 'Z', 'X'],
                   'Col3': ['Y', 'D', ''],
                   'Col4': ['', 'E', 'V'],})

print(tabulate(df, headers='keys', tablefmt='psql'))
df2 = df.groupby(['Col1'])
print(tabulate(df2, headers='keys', tablefmt='psql'))

But it doesn't group the NaN values...

How can I do this?

Thanks!

Answer 1

If is possible simply question for first non missing values per groups use GroupBy.first :

df = pd.DataFrame({'Col1': ['A', 'B', 'A'],
                   'Col2': ['X', 'Z', 'X'],
                   'Col3': ['Y', 'D', np.nan],
                   'Col4': [np.nan, 'E', 'V'],})


df2 = df.groupby(['Col1'], as_index=False).first()
print (df2)
  Col1 Col2 Col3 Col4
0    A    X    Y    V
1    B    Z    D    E

Answer 2

Using first() is more concise and neater. An alternative but less cool approach would be:

df.replace('', np.nan) \
.groupby('Col1', as_index=False) \
.fillna(method='bfill') \
.groupby('Col1') \
.nth(0)

Output:

Col1    Col2    Col3    Col4
A   X   Y   V
B   Z   D   E

or even you may use head() instead of nth() :

df.replace('', np.nan) \
.groupby('Col1', as_index=False) \
.fillna(method='bfill') \
.groupby('Col1') \
.head(1) \ 
.set_index('Col1')

Output:

Col1    Col2    Col3    Col4
A   X   Y   V
B   Z   D   E

Answer 3

Just use df.replace() on already initiated DataFrame to replace them with np.nan

df.replace('', np.nan)

Pandas - Group Rows based on a column and replace NaN with non-null values

Question

3 answers

solution1
4 ACCPTED 2020-02-11 11:36:04

solution2
0 2020-02-11 15:24:15

solution3
-1 2020-02-11 11:47:25

Pandas - Group Rows based on a column and replace NaN with non-null values

Question

3 answers

solution1 4 ACCPTED 2020-02-11 11:36:04

solution2 0 2020-02-11 15:24:15

solution3 -1 2020-02-11 11:47:25

solution1
4 ACCPTED 2020-02-11 11:36:04

solution2
0 2020-02-11 15:24:15

solution3
-1 2020-02-11 11:47:25