[英]Combining specific rows that have NaN for a different column on python using pandas
I am hoping this question makes sense.我希望这个问题是有道理的。 I have a table I extracted from a PDF of chemical names that I am trying to format and I am having issues it looks like this: table
我有一张从 PDF 中提取的表格,我试图格式化它的化学名称,但我遇到了如下所示的问题:表格
Some of the chemical names are split into multiple rows and I need each name in its won row.一些化学名称被分成多行,我需要在其赢得的行中的每个名称。 I did notice the chemicals whose names are split into multiple rows have an NaN in the first column.
我确实注意到名称被分成多行的化学物质在第一列中有一个 NaN。
EDIT: after running dt.head(15).to_dict()编辑:运行后 dt.head(15).to_dict()
{'Unnamed: 0': {6: '1', 7: nan, 8: '2', 9: '3', 10: nan, 11: nan, 12: '4', 13: '5', 14: nan, 15: nan, 16: '6', 17: '7', 18: '8', 19: '9', 20: nan}, 'Phenolics': {6: 'Dihydroquercetin', 7: '7,30-dimethyl ether', 8: 'Artelin', 9: 'Esculin 7-', 10: 'methylether', 11: '(methylesculin)', 12: 'Esculin', 13: 'Scopoletin (7-', 14: 'hydroxy-6-', 15: 'methoxycoumarin)', 16: 'Axillarin', 17: 'Esculetin', 18: 'Isoscopoletin', 19: '6-Beta-D-glucosyl-7-', 20: 'methoxycoumarin'}} {'未命名:0':{6:'1',7:nan,8:'2',9:'3',10:nan,11:nan,12:'4',13:'5', 14: nan, 15: nan, 16: '6', 17: '7', 18: '8', 19: '9', 20: nan}, '酚类': {6: '二氢槲皮素', 7: '7,30-二甲醚',8:'Artelin',9:'七叶苷 7-',10:'甲醚',11:'(甲基七叶苷)',12:'七叶苷',13:'东莨菪碱(7- ', 14: 'hydroxy-6-', 15: 'methoxycoumarin)', 16: 'Axillarin', 17: 'Esculetin', 18: 'Isoscopoletin', 19: '6-Beta-D-glucosyl-7-' , 20: '甲氧基香豆素'}}
Can anyone help me?谁能帮我? Thank you!
谢谢!
df["group"] = df["Unnamed: 0"].ffill()
df.groupby("group").agg({"Phenolics": "".join})
A one-line solution单线解决方案
df = df.fillna(method='ffill').groupby('Unnamed: 0')['Phenolics'].apply(' '.join).reset_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.