[英]Reading column delimited text data in python quickly
I have a column delimited data in a text file containing many variables. 我在包含许多变量的文本文件中有一个列分隔数据。 The original file was created in Fortran. 原始文件是在Fortran中创建的。 The number of values in each row is fixed (ie 8). 每行中的值的数量是固定的(即8)。 For example a value "-0.213897E-05" is immediately by seven other values on the same line. 例如,值“-0.213897E-05”立即由同一行上的七个其他值。 A blank column means "+ve sign". 空白栏表示“+ ve sign”。 There are 8 such rows but the total number of values could be between 62 and 64. Then there are about 1000 such variables. 有8个这样的行,但值的总数可以在62到64之间。然后有大约1000个这样的变量。
An example of the one variable in the file is as follows. 文件中的一个变量的示例如下。
-0.213897E-05 0.106493E-06-0.530198E-08 0.263970E-09-0.131423E-10 0.654316E-12-0.325765E-13 0.162189E-14
-0.427794E-05 0.212986E-06-0.106040E-07 0.527940E-09-0.262846E-10 0.130863E-11-0.651530E-13 0.324377E-14
-0.641691E-05 0.319479E-06-0.159059E-07 0.791910E-09-0.394269E-10 0.196295E-11-0.977294E-13 0.486566E-14
-0.855588E-05 0.425972E-06-0.212079E-07 0.105588E-08-0.525692E-10 0.261726E-11-0.130306E-12 0.648755E-14
-0.106949E-04 0.532465E-06-0.265099E-07 0.131985E-08-0.657114E-10 0.327158E-11-0.162882E-12 0.810944E-14
-0.128338E-04 0.638958E-06-0.318119E-07 0.158382E-08-0.788537E-10 0.392590E-11-0.195459E-12 0.973132E-14
-0.149728E-04 0.745452E-06-0.371138E-07 0.184779E-08-0.919960E-10 0.458021E-11-0.228035E-12 0.113532E-13
-0.171118E-04 0.851945E-06-0.424158E-07 0.211176E-08-0.105138E-09
I have successfully read the file using readlines() and then converting the string into floats but the result is slow and time-consuming. 我已成功使用readlines()读取文件,然后将字符串转换为浮点数,但结果很慢且耗时。 I also tried FortranFormat which was even slower. 我也试过FortranFormat甚至更慢。 The total size of the file is about 2GB. 文件的总大小约为2GB。
Please suggest a native way to read these values. 请建议一种本地方式来阅读这些值。 I have about 1000 such variables in the file. 我在文件中有大约1000个这样的变量。
Pandas can help here, there's a section about reading fixed width files in the docs , reading the text as StringIO (a file would work just the same). Pandas可以在这里提供帮助,有一节关于在文档中读取固定宽度文件 ,将文本读取为StringIO(文件可以正常工作)。
In [21]: colspecs = [(5 + 13 * i, 5 + 13 * (i + 1)) for i in range(8)]
In [22]: pd.read_fwf(StringIO(s), colspecs=colspecs, header=None)
Out[22]:
0 1 2 3 4 5 6 7
0 -0.000002 0.000000 -5.301980e-09 2.639700e-10 -1.314230e-11 6.543160e-13 -3.257650e-14 1.621890e-15
1 -0.000004 0.000000 -1.060400e-08 5.279400e-10 -2.628460e-11 1.308630e-12 -6.515300e-14 3.243770e-15
2 -0.000006 0.000000 -1.590590e-08 7.919100e-10 -3.942690e-11 1.962950e-12 -9.772940e-14 4.865660e-15
3 -0.000009 0.000000 -2.120790e-08 1.055880e-09 -5.256920e-11 2.617260e-12 -1.303060e-13 6.487550e-15
4 -0.000011 0.000001 -2.650990e-08 1.319850e-09 -6.571140e-11 3.271580e-12 -1.628820e-13 8.109440e-15
5 -0.000013 0.000001 -3.181190e-08 1.583820e-09 -7.885370e-11 3.925900e-12 -1.954590e-13 9.731320e-15
6 -0.000015 0.000001 -3.711380e-08 1.847790e-09 -9.199600e-11 4.580210e-12 -2.280350e-13 1.135320e-14
7 -0.000017 0.000001 -4.241580e-08 2.111760e-09 -1.051380e-10 NaN NaN NaN
These have been read in as floats . 这些已作为花车读入 。
Original answer: read_csv
might help you here, it's great for delimited text files: 原始答案: read_csv
可能对您有所帮助,它对于分隔的文本文件非常read_csv
:
pd.read_csv('your_file.txt', sep=' ')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.