简体   繁体   中英

Combining specific rows that have NaN for a different column on python using pandas

I am hoping this question makes sense. I have a table I extracted from a PDF of chemical names that I am trying to format and I am having issues it looks like this: table

Some of the chemical names are split into multiple rows and I need each name in its won row. I did notice the chemicals whose names are split into multiple rows have an NaN in the first column.

EDIT: after running dt.head(15).to_dict()

{'Unnamed: 0': {6: '1', 7: nan, 8: '2', 9: '3', 10: nan, 11: nan, 12: '4', 13: '5', 14: nan, 15: nan, 16: '6', 17: '7', 18: '8', 19: '9', 20: nan}, 'Phenolics': {6: 'Dihydroquercetin', 7: '7,30-dimethyl ether', 8: 'Artelin', 9: 'Esculin 7-', 10: 'methylether', 11: '(methylesculin)', 12: 'Esculin', 13: 'Scopoletin (7-', 14: 'hydroxy-6-', 15: 'methoxycoumarin)', 16: 'Axillarin', 17: 'Esculetin', 18: 'Isoscopoletin', 19: '6-Beta-D-glucosyl-7-', 20: 'methoxycoumarin'}}

Can anyone help me? Thank you!

df["group"] = df["Unnamed: 0"].ffill()
df.groupby("group").agg({"Phenolics": "".join})

A one-line solution

df = df.fillna(method='ffill').groupby('Unnamed: 0')['Phenolics'].apply(' '.join).reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM