简体   繁体   English

快速读取python中的列分隔文本数据

[英]Reading column delimited text data in python quickly

I have a column delimited data in a text file containing many variables. 我在包含许多变量的文本文件中有一个列分隔数据。 The original file was created in Fortran. 原始文件是在Fortran中创建的。 The number of values in each row is fixed (ie 8). 每行中的值的数量是固定的(即8)。 For example a value "-0.213897E-05" is immediately by seven other values on the same line. 例如,值“-0.213897E-05”立即由同一行上的七个其他值。 A blank column means "+ve sign". 空白栏表示“+ ve sign”。 There are 8 such rows but the total number of values could be between 62 and 64. Then there are about 1000 such variables. 有8个这样的行,但值的总数可以在62到64之间。然后有大约1000个这样的变量。

An example of the one variable in the file is as follows. 文件中的一个变量的示例如下。

     -0.213897E-05 0.106493E-06-0.530198E-08 0.263970E-09-0.131423E-10 0.654316E-12-0.325765E-13 0.162189E-14
     -0.427794E-05 0.212986E-06-0.106040E-07 0.527940E-09-0.262846E-10 0.130863E-11-0.651530E-13 0.324377E-14
     -0.641691E-05 0.319479E-06-0.159059E-07 0.791910E-09-0.394269E-10 0.196295E-11-0.977294E-13 0.486566E-14
     -0.855588E-05 0.425972E-06-0.212079E-07 0.105588E-08-0.525692E-10 0.261726E-11-0.130306E-12 0.648755E-14
     -0.106949E-04 0.532465E-06-0.265099E-07 0.131985E-08-0.657114E-10 0.327158E-11-0.162882E-12 0.810944E-14
     -0.128338E-04 0.638958E-06-0.318119E-07 0.158382E-08-0.788537E-10 0.392590E-11-0.195459E-12 0.973132E-14
     -0.149728E-04 0.745452E-06-0.371138E-07 0.184779E-08-0.919960E-10 0.458021E-11-0.228035E-12 0.113532E-13
     -0.171118E-04 0.851945E-06-0.424158E-07 0.211176E-08-0.105138E-09 

I have successfully read the file using readlines() and then converting the string into floats but the result is slow and time-consuming. 我已成功使用readlines()读取文件,然后将字符串转换为浮点数,但结果很慢且耗时。 I also tried FortranFormat which was even slower. 我也试过FortranFormat甚至更慢。 The total size of the file is about 2GB. 文件的总大小约为2GB。

Please suggest a native way to read these values. 请建议一种本地方式来阅读这些值。 I have about 1000 such variables in the file. 我在文件中有大约1000个这样的变量。

Pandas can help here, there's a section about reading fixed width files in the docs , reading the text as StringIO (a file would work just the same). Pandas可以在这里提供帮助,有一节关于在文档中读取固定宽度文件 ,将文本读取为StringIO(文件可以正常工作)。

In [21]: colspecs = [(5 + 13 * i, 5 + 13 * (i + 1)) for i in range(8)]

In [22]: pd.read_fwf(StringIO(s), colspecs=colspecs, header=None)
Out[22]:
          0         1             2             3             4             5             6             7
0 -0.000002  0.000000 -5.301980e-09  2.639700e-10 -1.314230e-11  6.543160e-13 -3.257650e-14  1.621890e-15
1 -0.000004  0.000000 -1.060400e-08  5.279400e-10 -2.628460e-11  1.308630e-12 -6.515300e-14  3.243770e-15
2 -0.000006  0.000000 -1.590590e-08  7.919100e-10 -3.942690e-11  1.962950e-12 -9.772940e-14  4.865660e-15
3 -0.000009  0.000000 -2.120790e-08  1.055880e-09 -5.256920e-11  2.617260e-12 -1.303060e-13  6.487550e-15
4 -0.000011  0.000001 -2.650990e-08  1.319850e-09 -6.571140e-11  3.271580e-12 -1.628820e-13  8.109440e-15
5 -0.000013  0.000001 -3.181190e-08  1.583820e-09 -7.885370e-11  3.925900e-12 -1.954590e-13  9.731320e-15
6 -0.000015  0.000001 -3.711380e-08  1.847790e-09 -9.199600e-11  4.580210e-12 -2.280350e-13  1.135320e-14
7 -0.000017  0.000001 -4.241580e-08  2.111760e-09 -1.051380e-10           NaN           NaN           NaN

These have been read in as floats . 这些已作为花车读入


Original answer: read_csv might help you here, it's great for delimited text files: 原始答案: read_csv可能对您有所帮助,它对于分隔的文本文件非常read_csv

pd.read_csv('your_file.txt', sep=' ')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中读取分隔的文本文件 - Reading a delimited text file in python 在python中快速访问/查询大分隔文本文件 - Quickly accessing/querying large delimited text file in python Python:一遍又一遍快速读取大量数据 - Python: reading a lot of data quickly over and over again 从python中的文本文件中读取列数据(自定义空格作为分隔符) - Reading column data from a text file in python (custom spaces as delimiter) 通过python中的一个公共列合并两个制表符分隔的文本文件 - Merge two tab delimited text files by one common column in python Python 2-将制表符分隔的文本文件数据解析为列表 - Python 2 - parse tab-delimited text file data into a list 将具有命名列的文本文件(制表符/空格分隔)读取到列表中,列表名称与列名相同 - Reading a text file (tab/space delimited) having named columns into lists with the lists having the same name as the column name 在读取制表符分隔的数据时,Pandas似乎忽略了第一列名称,给出了KeyError - Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError 使用python从文本读取列 - reading a column from text using python Python-从列上的定界文本提取/复制定界文本到新列xlsx - Python - extract/copy delimited text from from on column to new column xlsx
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM