簡體   English   中英

pandas 導入刪除了前導零

[英]Leading zeros removed by pandas import

為什么我的前導零沒有留在 column_item 和 line_item 中?

#format columns
dtype_dic= {'line_item': str, 
            'column_item' : str}

# loop over the list of csv files
for f in csv_files:
      
    # read the csv file
    df = pd.read_csv(f, sep=";", dtype = dtype_dic)
    df_list.append(df)

例如 line_item 應該是:0010 0036 0230 1929

但導入后變成:10 36 230 1929

這是 CSV 文件:

entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22

我希望你能指出我正確的方向。

您的代碼按預期運行,此處使用io.StringIO作為文件代理:

data='''entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22'''

dtype_dic= {'line_item': str, 
            'column_item' : str}

df = pd.read_csv(io.StringIO(data), sep=";", dtype = dtype_dic)

數據類型:

entity                   int64
business_line_group     object
conso_level_entity       int64
report_name             object
line_item               object
column_item             object
z_axis                  object
value_text              object
amount                 float64
approval_text           object
dtype: object

output:

    entity business_line_group  conso_level_entity report_name line_item column_item     z_axis value_text        amount approval_text
0      456                test                 456   C_72_00_a      0070        0010  UNDEFINED    Value 1  5.198630e+06     28-feb-22
1      456                test                 456   C_72_00_a      0190        0010  UNDEFINED    Value 1  8.358922e+08     28-feb-22
2      456                test                 456   C_72_00_a      0260        0010  UNDEFINED    Value 1  4.745984e+09     28-feb-22
3      456                test                 456   C_73_00_a      0035        0010  UNDEFINED    Value 2  2.542482e+10     28-feb-22
4      456                test                 456   C_73_00_a      0070        0010  UNDEFINED    Value 2 -3.321623e+10     28-feb-22
5      456                test                 456   C_73_00_a      0080        0010  UNDEFINED    Value 1 -2.096612e+10     28-feb-22
6      456                test                 456   C_73_00_a      0110        0010  UNDEFINED    Value 1 -9.384699e+09     28-feb-22
7      456                test                 456   C_73_00_a      0230        0010  UNDEFINED    Value 1  2.193606e+09     28-feb-22
8      456                test                 456   C_73_00_a      0250        0010  UNDEFINED    Value 1 -5.737692e+08     28-feb-22
9      456                test                 456   C_73_00_a      0260        0010  UNDEFINED    Value 1  3.333715e+09     28-feb-22
10     456                test                 456   C_73_00_a      0918        0010  UNDEFINED    Value 1  1.243660e+05     28-feb-22
11     456                test                 456   C_74_00_a      0160        0010  UNDEFINED    Value 5 -5.434580e+10     28-feb-22
12     456                test                 456   C_74_00_a      0260        0010  UNDEFINED    Value 5  1.503482e+05     28-feb-22
13     456                test                 456   C_73_00_a      1100        0010  UNDEFINED    Value 5 -3.763345e+10     28-feb-22
14     456                test                 456   C_73_00_a      1100        0020  UNDEFINED    Value 5 -3.764350e+09     28-feb-22
15     456                test                 456   C_73_00_a      1040        0040  UNDEFINED    Value 3  3.376435e+10     28-feb-22
16     456                test                 456   C_73_00_a      1045        0040  UNDEFINED    Value 3  3.376435e+10     28-feb-22
17     456                test                 456   C_73_00_a      1045        0030  UNDEFINED    Value 3  3.350982e+08     28-feb-22
18     456                test                 456   C_73_00_a      1040        0010  UNDEFINED    Value 3  7.449687e+06     28-feb-22
19     456                test                 456   C_73_00_a      1045        0010  UNDEFINED    Value 1  7.644969e+07     28-feb-22

您可以嘗試使用converters嗎:

dict_conv={'line_item': lambda x: str(x),
           'column_item': lambda x: str(x)}

df = pd.read_csv('data.csv', sep=';', converters=dict_conv)

更新

事實上,你甚至不需要在這里使用lambda function(但我不確定)

dict_conv={'line_item': str,
           'column_item': str}

df = pd.read_csv('data.csv', sep=';', converters=dict_conv)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM