使用ython读取文本文件中没有空格的数字

Question

I'm a newbie with Python and struggle to read a text file like this: 我是Python的新手，很难读取这样的文本文件：

  0.42617E-03-0.19725E+09-0.21139E+09 0.37077E+08
  0.85234E-03-0.18031E+09-0.18340E+09 0.28237E+08
  0.12785E-02-0.16583E+09-0.15887E+09 0.20637E+08

There are thus no comma or space delimiters between the numbers in the file. 因此，文件中的数字之间没有逗号或空格分隔符。 With Matlab I know how to specify the formats, but how to do it in Python? 使用Matlab我知道如何指定格式，但如何在Python中执行？

I have been trying np.loadtxt but don't know how to set number of digits to read, so if anyone could give me a hint on this I would be much grateful. 我一直在尝试np.loadtxt但不知道如何设置要读取的位数，所以如果有人能给我一个提示，我会非常感激。

Thanks in advance, Erik 提前谢谢，Erik

Answer 1

To expand on my comment, based on the fact that you can successfully parse this with MATLAB, I assume that these fields are fixed width. 为了扩展我的注释，基于您可以使用MATLAB成功解析它的事实，我假设这些字段是固定宽度。 In that case, you can just slice each row based on the field width, and then convert that to a numpy array if that's what you need. 在这种情况下，您可以根据字段宽度切片每一行，然后将其转换为numpy数组，如果这是您需要的。 As an example: 举个例子：

import numpy

input_data = """ 0.42617E-03-0.19725E+09-0.21139E+09 0.37077E+08
 0.85234E-03-0.18031E+09-0.18340E+09 0.28237E+08
 0.12785E-02-0.16583E+09-0.15887E+09 0.20637E+08
"""
input_rows = input_data.split('\n')

width = 12
num_fields = 4

data = []
for input_row in input_rows:
    if not input_row:
        continue
    data.append([float(input_row[width * i:width * (i + 1)].strip()) for i in range(num_fields)])

data = numpy.array(data)
print(data)

This outputs: 这输出：

[[  4.26170000e-04  -1.97250000e+08  -2.11390000e+08   3.70770000e+07]
 [  8.52340000e-04  -1.80310000e+08  -1.83400000e+08   2.82370000e+07]
 [  1.27850000e-03  -1.65830000e+08  -1.58870000e+08   2.06370000e+07]]

Of course, this example uses a fixed string to represent the input data, but you can imagine doing a similar thing with your input stream. 当然，此示例使用固定字符串来表示输入数据，但您可以想象对输入流执行类似操作。

Answer 2

The other answers use methods that rely on the fact that the numbers are of the same width or use scientific method. 其他答案使用的方法依赖于数字宽度相同或使用科学方法的事实。 Here I present a method that accepts any representation of floats, fixed width or not. 这里我提出一种方法，接受浮动的任何表示，固定宽度与否。

If you were dealing with the given input in C, you probably would have used scanf or sscanf . 如果你在C中处理给定的输入，你可能会使用scanf或sscanf 。 Python has functionalities that are similar with printf (such as the format method for strings), but it does not have anything like scanf or sscanf . Python具有与printf类似的功能（例如字符串的format方法），但它没有像scanf或sscanf那样的东西。

Fortunately, you can use ctypes in the Python standard library to directly use the sscanf function. 幸运的是，您可以在Python标准库中使用ctypes直接使用sscanf函数。 The following example is for Python on Linux systems: 以下示例适用于Linux系统上的Python：

import ctypes
libc = ctypes.CDLL("libc.so.6")
sscanf = libc.sscanf
with open("test") as fp:
    for l in fp:
        float_1 = ctypes.c_float()
        float_2 = ctypes.c_float()
        float_3 = ctypes.c_float()
        float_4 = ctypes.c_float()
        sscanf(ctypes.create_string_buffer(bytes(l,"utf8")), b"%f%f%f%f", ctypes.byref(float_1), ctypes.byref(float_2),ctypes.byref(float_3),ctypes.byref(float_4))
       # You can check the return va Lue of sscanf for errors. It should return 1 when every
        print(float_1.value, float_2.value, float_3.value, float_4.value)

The output is 输出是

0.00042617000872269273 -197250000.0 -211390000.0 37077000.0
0.0008523400174453855 -180310000.0 -183400000.0 28237000.0
0.0012784999562427402 -165830000.0 -158870000.0 20637000.0

In the (unlikely) case that your system does not use glibc or uses a older version, change the path of the library accordingly. 在（不太可能）您的系统不使用glibc或使用旧版本的情况下，相应地更改库的路径。 (it is very unlikely for your system to not have a C library or for that library not to implement scanf) If you are using Windows, change libc = ctypes.CDLL("libc.so.6") to （您的系统不太可能没有C库或该库不能实现scanf）如果您使用的是Windows， libc = ctypes.CDLL("libc.so.6")更改为

libc = ctypes.cdll.msvcrt # Loads MS standard C Library

ctypes simply calls the functions in the dynamic library using the standard calling conventions. ctypes使用标准调用约定简单地调用动态库中的函数。 You can use it to interface Python code with almost any C libraries 您可以使用它将Python代码与几乎任何C库连接起来

If you do not wish to use ctypes , then you could use some community libraries, such as scanf or parse , both implements the functionalities of scanf . 如果你不想使用ctypes ，那么你可以使用一些社区库，比如scanf或parse ，它们都实现了scanf的功能。

Answer 3

You could abuse the fact that you the numbers appear to all be in scientific notation and use regular expressions to pull each one out. 你可以滥用这样一个事实：数字看起来都是科学记数法，并使用正则表达式将每个数字拉出来。

import re

e_numbers = re.compile(r"[\d.]*?E[+-]\d{2}")

with open('yourfile.txt') as f:
    numbers = [float(num) for lst in [e_numbers.findall(line) for line in f] for num in lst]

To pull that regex out: 拉出那个正则表达式：

e_numbers = re.compile(r'''
    [\d.]*?             # zero or more of the following:
                        #   0123456789.
                        # matching the fewest possible
    E                   # the literal letter 'E'
    [+-]                # either a literal '+' or a literal '-'
    \d{2}               # followed by two digits 0-9''', re.X)

使用ython读取文本文件中没有空格的数字

问题描述

3 个解决方案

解决方案1
2 2015-12-02 17:08:57

解决方案2
1 2019-07-05 16:25:39

解决方案3
0 2015-12-02 17:16:31

使用ython读取文本文件中没有空格的数字

问题描述

3 个解决方案

解决方案1 2 2015-12-02 17:08:57

解决方案2 1 2019-07-05 16:25:39

解决方案3 0 2015-12-02 17:16:31

解决方案1
2 2015-12-02 17:08:57

解决方案2
1 2019-07-05 16:25:39

解决方案3
0 2015-12-02 17:16:31