[英]Leading zeros removed by pandas import
為什么我的前導零沒有留在 column_item 和 line_item 中?
#format columns
dtype_dic= {'line_item': str,
'column_item' : str}
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f, sep=";", dtype = dtype_dic)
df_list.append(df)
例如 line_item 應該是:0010 0036 0230 1929
但導入后變成:10 36 230 1929
這是 CSV 文件:
entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22
我希望你能指出我正確的方向。
您的代碼按預期運行,此處使用io.StringIO
作為文件代理:
data='''entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22'''
dtype_dic= {'line_item': str,
'column_item' : str}
df = pd.read_csv(io.StringIO(data), sep=";", dtype = dtype_dic)
數據類型:
entity int64
business_line_group object
conso_level_entity int64
report_name object
line_item object
column_item object
z_axis object
value_text object
amount float64
approval_text object
dtype: object
output:
entity business_line_group conso_level_entity report_name line_item column_item z_axis value_text amount approval_text
0 456 test 456 C_72_00_a 0070 0010 UNDEFINED Value 1 5.198630e+06 28-feb-22
1 456 test 456 C_72_00_a 0190 0010 UNDEFINED Value 1 8.358922e+08 28-feb-22
2 456 test 456 C_72_00_a 0260 0010 UNDEFINED Value 1 4.745984e+09 28-feb-22
3 456 test 456 C_73_00_a 0035 0010 UNDEFINED Value 2 2.542482e+10 28-feb-22
4 456 test 456 C_73_00_a 0070 0010 UNDEFINED Value 2 -3.321623e+10 28-feb-22
5 456 test 456 C_73_00_a 0080 0010 UNDEFINED Value 1 -2.096612e+10 28-feb-22
6 456 test 456 C_73_00_a 0110 0010 UNDEFINED Value 1 -9.384699e+09 28-feb-22
7 456 test 456 C_73_00_a 0230 0010 UNDEFINED Value 1 2.193606e+09 28-feb-22
8 456 test 456 C_73_00_a 0250 0010 UNDEFINED Value 1 -5.737692e+08 28-feb-22
9 456 test 456 C_73_00_a 0260 0010 UNDEFINED Value 1 3.333715e+09 28-feb-22
10 456 test 456 C_73_00_a 0918 0010 UNDEFINED Value 1 1.243660e+05 28-feb-22
11 456 test 456 C_74_00_a 0160 0010 UNDEFINED Value 5 -5.434580e+10 28-feb-22
12 456 test 456 C_74_00_a 0260 0010 UNDEFINED Value 5 1.503482e+05 28-feb-22
13 456 test 456 C_73_00_a 1100 0010 UNDEFINED Value 5 -3.763345e+10 28-feb-22
14 456 test 456 C_73_00_a 1100 0020 UNDEFINED Value 5 -3.764350e+09 28-feb-22
15 456 test 456 C_73_00_a 1040 0040 UNDEFINED Value 3 3.376435e+10 28-feb-22
16 456 test 456 C_73_00_a 1045 0040 UNDEFINED Value 3 3.376435e+10 28-feb-22
17 456 test 456 C_73_00_a 1045 0030 UNDEFINED Value 3 3.350982e+08 28-feb-22
18 456 test 456 C_73_00_a 1040 0010 UNDEFINED Value 3 7.449687e+06 28-feb-22
19 456 test 456 C_73_00_a 1045 0010 UNDEFINED Value 1 7.644969e+07 28-feb-22
您可以嘗試使用converters
嗎:
dict_conv={'line_item': lambda x: str(x),
'column_item': lambda x: str(x)}
df = pd.read_csv('data.csv', sep=';', converters=dict_conv)
更新
事實上,你甚至不需要在這里使用lambda
function(但我不確定)
dict_conv={'line_item': str,
'column_item': str}
df = pd.read_csv('data.csv', sep=';', converters=dict_conv)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.