[英]Reformat weird Dataframe
Name![]() |
place![]() |
pers_data ![]() |
---|---|---|
NaN![]() |
NaN![]() |
Nan![]() |
Smith John![]() |
NY![]() |
sjohn@gmail.com ![]() |
NaN![]() |
Nan![]() |
0987 4567 ![]() |
NaN![]() |
NaN![]() |
0653 6734 ![]() |
Vic Stied![]() |
SA ![]() |
0986 5332 ![]() |
NaN![]() |
NaN![]() |
vickie@hotmail.com ![]() |
I would like to delete the NaN values and reformat the file like the following:我想删除 NaN 值并重新格式化文件,如下所示:
Name![]() |
Place![]() |
pers_data ![]() |
other![]() |
other_2 ![]() |
---|---|---|---|---|
Smith John![]() |
NY![]() |
sjohn@gmail.com ![]() |
0987 4567 ![]() |
0653 6734 ![]() |
Vic Stied![]() |
SA ![]() |
vickie@hotmail.com ![]() |
0986 5332 ![]() |
Can someone help me with that, I tried some stuff but without understanding anything, I'd like to really get what I am doing.有人可以帮我吗,我尝试了一些东西,但什么都不懂,我想真正了解自己在做什么。
This is a variation on a pivot
:这是
pivot
的变体:
idx = df['Name'].notna().cumsum()
out = (df
.assign(col=df.groupby(idx).cumcount(),
Name=df['Name'].groupby(idx).ffill(),
place=df['place'].groupby(idx).ffill()
)
.pivot(index=['Name', 'place'], columns='col', values='pers_data')
.add_prefix('other_').rename(columns={'other_0': 'pers_data'})
.reset_index().rename_axis(columns=None)
.dropna(how='all')
)
output: output:
Name place pers_data other_1 other_2
1 Smith John NY sjohn@gmail.com 0987 4567 0653 6734
2 Vic Stied SA 0986 5332 vickie@hotmail.com NaN
df1.loc[~df1.isna().all(axis=1)].fillna(method='ffill')\
.groupby(['Name','place']).agg(','.join)\
.pers_data.str.split(',',expand=True).add_prefix('other_')\
.rename(columns={'other_0':'pers_data'}).reset_index()
Name place pers_data other_1 other_2
0 Smith John NY sjohn@gmail.com 0987 4567 0653 6734
1 Vic Stied SA 0986 5332 vickie@hotmail.com None
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.