I'm trying to create some aggregations with strings on my dataframe, based on a target "group-by" column.
Imagine that I have the following dataframe with 4 columns:
I want to group all the rows based on column "Col1" and in the case o NaN group with the value that is not null.
The desired output is like this:
I also try to use a normal:
import pandas as pd
from tabulate import tabulate
df = pd.DataFrame({'Col1': ['A', 'B', 'A'],
'Col2': ['X', 'Z', 'X'],
'Col3': ['Y', 'D', ''],
'Col4': ['', 'E', 'V'],})
print(tabulate(df, headers='keys', tablefmt='psql'))
df2 = df.groupby(['Col1'])
print(tabulate(df2, headers='keys', tablefmt='psql'))
But it doesn't group the NaN values...
How can I do this?
Thanks!
If is possible simply question for first non missing values per groups use GroupBy.first
:
df = pd.DataFrame({'Col1': ['A', 'B', 'A'],
'Col2': ['X', 'Z', 'X'],
'Col3': ['Y', 'D', np.nan],
'Col4': [np.nan, 'E', 'V'],})
df2 = df.groupby(['Col1'], as_index=False).first()
print (df2)
Col1 Col2 Col3 Col4
0 A X Y V
1 B Z D E
Using first()
is more concise and neater. An alternative but less cool approach would be:
df.replace('', np.nan) \
.groupby('Col1', as_index=False) \
.fillna(method='bfill') \
.groupby('Col1') \
.nth(0)
Output:
Col1 Col2 Col3 Col4
A X Y V
B Z D E
or even you may use head()
instead of nth()
:
df.replace('', np.nan) \
.groupby('Col1', as_index=False) \
.fillna(method='bfill') \
.groupby('Col1') \
.head(1) \
.set_index('Col1')
Output:
Col1 Col2 Col3 Col4
A X Y V
B Z D E
Just use df.replace() on already initiated DataFrame to replace them with np.nan
df.replace('', np.nan)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.