简体   繁体   English

通过使用 Pandas Python 合并列来填充空白区域

[英]Fill in empty space by merging columns using Pandas Python

I am a Python noob.我是 Python 菜鸟。 I have an unstructured text file that I'm trying to capture to a dataframe and export to excel. I need to merge 38 to 36, 45 to 43, and 79 to 78 filling in the empty space with the data on the merging column.我有一个非结构化文本文件,我试图将其捕获到 dataframe 并导出到 excel。我需要合并 38 到 36、45 到 43 和 79 到 78,用合并列上的数据填充空白区域。

Dummy Dataset虚拟数据集

0 0 5 5个 36 36 38 38 43 43 45 45 78 78 79 79
1 1个 A一种 01JUN2022 01JUN2022 1.2 1.2 B 1.2 1.2
2 2个 C C 01JUN2022 01JUN2022 1.4 1.4 D 1.4 1.4
3 3个 E 01JUN2022 01JUN2022 1.5 1.5 F F 1.6 1.6
4 4个 G G 01JUN2022 01JUN2022 1.7 1.7 H H 1.7 1.7
5 5个 I 01JUN2022 01JUN2022 1.4 1.4 J 1.8 1.8
6 6个 K 01JUN2022 01JUN2022 1.7 1.7 L大号 1.3 1.3
1 1个 A一种 01JUN2022 01JUN2022 1.2 1.2 B 1.2 1.2
2 2个 C C 01JUN2022 01JUN2022 1.4 1.4 D 1.4 1.4
3 3个 E 01JUN2022 01JUN2022 1.5 1.5 F F 1.6 1.6
4 4个 G G 01JUN2022 01JUN2022 1.7 1.7 H H 1.7 1.7
5 5个 I 01JUN2022 01JUN2022 1.4 1.4 J 1.8 1.8
6 6个 K 01JUN2022 01JUN2022 1.7 1.7 L大号 1.3 1.3

Required output必填 output

0 0 5 5个 36 36 43 43 79 79
1 1个 A一种 01JUN2022 01JUN2022 1.2 1.2 B 1.2 1.2
2 2个 C C 01JUN2022 01JUN2022 1.4 1.4 D 1.4 1.4
3 3个 E 01JUN2022 01JUN2022 1.5 1.5 F F 1.6 1.6
4 4个 G G 01JUN2022 01JUN2022 1.7 1.7 H H 1.7 1.7
5 5个 I 01JUN2022 01JUN2022 1.4 1.4 J 1.8 1.8
6 6个 K 01JUN2022 01JUN2022 1.7 1.7 L大号 1.3 1.3
1 1个 A一种 01JUN2022 01JUN2022 1.2 1.2 B 1.2 1.2
2 2个 C C 01JUN2022 01JUN2022 1.4 1.4 D 1.4 1.4
3 3个 E 01JUN2022 01JUN2022 1.5 1.5 F F 1.6 1.6
4 4个 G G 01JUN2022 01JUN2022 1.7 1.7 H H 1.7 1.7
5 5个 I 01JUN2022 01JUN2022 1.4 1.4 J 1.8 1.8
6 6个 K 01JUN2022 01JUN2022 1.7 1.7 L大号 1.3 1.3

Would start by converting '' to NaN as follows首先将''转换为NaN ,如下所示

df = df.replace(r'^\s*$', np.nan, regex=True)

Then one can use pandas.Series.combine_first然后可以使用pandas.Series.combine_first

df['36'] = df['36'].combine_first(df['38'])
df['43'] = df['43'].combine_first(df['45'])
df['79'] = df['79'].combine_first(df['78'])

[Out]:
    id  0          5   36   38 43   45   78   79
0    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
1    2  C  01JUN2022  1.4  1.4  D    D  NaN  1.4
2    3  E  01JUN2022  1.5  NaN  F  NaN  1.6  1.6
3    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
4    5  I  01JUN2022  1.4  NaN  J  NaN  1.8  1.8
5    6  K  01JUN2022  1.7  NaN  L  NaN  1.3  1.3
6    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
7    2  C  01JUN2022  1.4  1.4  D  NaN  1.4  1.4
8    3  E  01JUN2022  1.5  1.5  F  NaN  1.6  1.6
9    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
10   5  I  01JUN2022  1.4  NaN  J    J  NaN  1.8
11   6  K  01JUN2022  1.7  NaN  L  NaN  NaN  1.3

Finally, one can drop the columns that one doesn't want or select the one's to display as follows最后,可以删除不需要的列或 select 显示如下

df = df[['0', '5', '36', '43', '79']]

[Out]:

    0          5   36 43   79
0   A  01JUN2022  1.2  B  1.2
1   C  01JUN2022  1.4  D  1.4
2   E  01JUN2022  1.5  F  1.6
3   G  01JUN2022  1.7  H  1.7
4   I  01JUN2022  1.4  J  1.8
5   K  01JUN2022  1.7  L  1.3
6   A  01JUN2022  1.2  B  1.2
7   C  01JUN2022  1.4  D  1.4
8   E  01JUN2022  1.5  F  1.6
9   G  01JUN2022  1.7  H  1.7
10  I  01JUN2022  1.4  J  1.8
11  K  01JUN2022  1.7  L  1.3

and this gives the desired output.这给出了所需的 output。


Notes:笔记:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM