pandas 導入刪除了前導零

Question

為什么我的前導零沒有留在 column_item 和 line_item 中？

#format columns
dtype_dic= {'line_item': str, 
            'column_item' : str}

# loop over the list of csv files
for f in csv_files:
      
    # read the csv file
    df = pd.read_csv(f, sep=";", dtype = dtype_dic)
    df_list.append(df)

例如 line_item 應該是：0010 0036 0230 1929

但導入后變成：10 36 230 1929

這是 CSV 文件：

entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22

我希望你能指出我正確的方向。

Answer 1

您的代碼按預期運行，此處使用io.StringIO作為文件代理：

data='''entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22'''

dtype_dic= {'line_item': str, 
            'column_item' : str}

df = pd.read_csv(io.StringIO(data), sep=";", dtype = dtype_dic)

數據類型：

entity                   int64
business_line_group     object
conso_level_entity       int64
report_name             object
line_item               object
column_item             object
z_axis                  object
value_text              object
amount                 float64
approval_text           object
dtype: object

output：

    entity business_line_group  conso_level_entity report_name line_item column_item     z_axis value_text        amount approval_text
0      456                test                 456   C_72_00_a      0070        0010  UNDEFINED    Value 1  5.198630e+06     28-feb-22
1      456                test                 456   C_72_00_a      0190        0010  UNDEFINED    Value 1  8.358922e+08     28-feb-22
2      456                test                 456   C_72_00_a      0260        0010  UNDEFINED    Value 1  4.745984e+09     28-feb-22
3      456                test                 456   C_73_00_a      0035        0010  UNDEFINED    Value 2  2.542482e+10     28-feb-22
4      456                test                 456   C_73_00_a      0070        0010  UNDEFINED    Value 2 -3.321623e+10     28-feb-22
5      456                test                 456   C_73_00_a      0080        0010  UNDEFINED    Value 1 -2.096612e+10     28-feb-22
6      456                test                 456   C_73_00_a      0110        0010  UNDEFINED    Value 1 -9.384699e+09     28-feb-22
7      456                test                 456   C_73_00_a      0230        0010  UNDEFINED    Value 1  2.193606e+09     28-feb-22
8      456                test                 456   C_73_00_a      0250        0010  UNDEFINED    Value 1 -5.737692e+08     28-feb-22
9      456                test                 456   C_73_00_a      0260        0010  UNDEFINED    Value 1  3.333715e+09     28-feb-22
10     456                test                 456   C_73_00_a      0918        0010  UNDEFINED    Value 1  1.243660e+05     28-feb-22
11     456                test                 456   C_74_00_a      0160        0010  UNDEFINED    Value 5 -5.434580e+10     28-feb-22
12     456                test                 456   C_74_00_a      0260        0010  UNDEFINED    Value 5  1.503482e+05     28-feb-22
13     456                test                 456   C_73_00_a      1100        0010  UNDEFINED    Value 5 -3.763345e+10     28-feb-22
14     456                test                 456   C_73_00_a      1100        0020  UNDEFINED    Value 5 -3.764350e+09     28-feb-22
15     456                test                 456   C_73_00_a      1040        0040  UNDEFINED    Value 3  3.376435e+10     28-feb-22
16     456                test                 456   C_73_00_a      1045        0040  UNDEFINED    Value 3  3.376435e+10     28-feb-22
17     456                test                 456   C_73_00_a      1045        0030  UNDEFINED    Value 3  3.350982e+08     28-feb-22
18     456                test                 456   C_73_00_a      1040        0010  UNDEFINED    Value 3  7.449687e+06     28-feb-22
19     456                test                 456   C_73_00_a      1045        0010  UNDEFINED    Value 1  7.644969e+07     28-feb-22

Answer 2

您可以嘗試使用converters嗎：

dict_conv={'line_item': lambda x: str(x),
           'column_item': lambda x: str(x)}

df = pd.read_csv('data.csv', sep=';', converters=dict_conv)

更新

事實上，你甚至不需要在這里使用lambda function（但我不確定）

dict_conv={'line_item': str,
           'column_item': str}

df = pd.read_csv('data.csv', sep=';', converters=dict_conv)

pandas 導入刪除了前導零

問題描述

2 個解決方案

解決方案1
1 2022-03-22 09:14:12

解決方案2
1 已采納 2022-03-22 09:20:55

pandas 導入刪除了前導零

問題描述

2 個解決方案

解決方案1 1 2022-03-22 09:14:12

解決方案2 1 已采納 2022-03-22 09:20:55

解決方案1
1 2022-03-22 09:14:12

解決方案2
1 已采納 2022-03-22 09:20:55