使用 pandas 组合 python 上不同列的具有 NaN 的特定行

Question

I am hoping this question makes sense.我希望这个问题是有道理的。 I have a table I extracted from a PDF of chemical names that I am trying to format and I am having issues it looks like this: table我有一张从 PDF 中提取的表格，我试图格式化它的化学名称，但我遇到了如下所示的问题：表格

Some of the chemical names are split into multiple rows and I need each name in its won row.一些化学名称被分成多行，我需要在其赢得的行中的每个名称。 I did notice the chemicals whose names are split into multiple rows have an NaN in the first column.我确实注意到名称被分成多行的化学物质在第一列中有一个 NaN。

EDIT: after running dt.head(15).to_dict()编辑：运行后 dt.head(15).to_dict()

{'Unnamed: 0': {6: '1', 7: nan, 8: '2', 9: '3', 10: nan, 11: nan, 12: '4', 13: '5', 14: nan, 15: nan, 16: '6', 17: '7', 18: '8', 19: '9', 20: nan}, 'Phenolics': {6: 'Dihydroquercetin', 7: '7,30-dimethyl ether', 8: 'Artelin', 9: 'Esculin 7-', 10: 'methylether', 11: '(methylesculin)', 12: 'Esculin', 13: 'Scopoletin (7-', 14: 'hydroxy-6-', 15: 'methoxycoumarin)', 16: 'Axillarin', 17: 'Esculetin', 18: 'Isoscopoletin', 19: '6-Beta-D-glucosyl-7-', 20: 'methoxycoumarin'}} {'未命名：0'：{6：'1'，7：nan，8：'2'，9：'3'，10：nan，11：nan，12：'4'，13：'5'， 14: nan, 15: nan, 16: '6', 17: '7', 18: '8', 19: '9', 20: nan}, '酚类': {6: '二氢槲皮素', 7: '7,30-二甲醚'，8：'Artelin'，9：'七叶苷 7-'，10：'甲醚'，11：'（甲基七叶苷）'，12：'七叶苷'，13：'东莨菪碱（7- ', 14: 'hydroxy-6-', 15: 'methoxycoumarin)', 16: 'Axillarin', 17: 'Esculetin', 18: 'Isoscopoletin', 19: '6-Beta-D-glucosyl-7-' , 20: '甲氧基香豆素'}}

Can anyone help me?谁能帮我？ Thank you!谢谢！

Answer 1

df["group"] = df["Unnamed: 0"].ffill()
df.groupby("group").agg({"Phenolics": "".join})

Answer 2

A one-line solution单线解决方案

df = df.fillna(method='ffill').groupby('Unnamed: 0')['Phenolics'].apply(' '.join).reset_index()

使用 pandas 组合 python 上不同列的具有 NaN 的特定行

问题描述

2 个解决方案

解决方案1
0 2022-08-08 22:37:30

解决方案2
0 已采纳 2022-08-08 22:47:24

使用 pandas 组合 python 上不同列的具有 NaN 的特定行

问题描述

2 个解决方案

解决方案1 0 2022-08-08 22:37:30

解决方案2 0 已采纳 2022-08-08 22:47:24

解决方案1
0 2022-08-08 22:37:30

解决方案2
0 已采纳 2022-08-08 22:47:24