简体   繁体   English

使用ython读取文本文件中没有空格的数字

[英]Read numbers without spaces in text-file with ython

I'm a newbie with Python and struggle to read a text file like this: 我是Python的新手,很难读取这样的文本文件:

  0.42617E-03-0.19725E+09-0.21139E+09 0.37077E+08
  0.85234E-03-0.18031E+09-0.18340E+09 0.28237E+08
  0.12785E-02-0.16583E+09-0.15887E+09 0.20637E+08

There are thus no comma or space delimiters between the numbers in the file. 因此,文件中的数字之间没有逗号或空格分隔符。 With Matlab I know how to specify the formats, but how to do it in Python? 使用Matlab我知道如何指定格式,但如何在Python中执行?

I have been trying np.loadtxt but don't know how to set number of digits to read, so if anyone could give me a hint on this I would be much grateful. 我一直在尝试np.loadtxt但不知道如何设置要读取的位数,所以如果有人能给我一个提示,我会非常感激。

Thanks in advance, Erik 提前谢谢,Erik

To expand on my comment, based on the fact that you can successfully parse this with MATLAB, I assume that these fields are fixed width. 为了扩展我的注释,基于您可以使用MATLAB成功解析它的事实,我假设这些字段是固定宽度。 In that case, you can just slice each row based on the field width, and then convert that to a numpy array if that's what you need. 在这种情况下,您可以根据字段宽度切片每一行,然后将其转换为numpy数组,如果这是您需要的。 As an example: 举个例子:

import numpy

input_data = """ 0.42617E-03-0.19725E+09-0.21139E+09 0.37077E+08
 0.85234E-03-0.18031E+09-0.18340E+09 0.28237E+08
 0.12785E-02-0.16583E+09-0.15887E+09 0.20637E+08
"""
input_rows = input_data.split('\n')

width = 12
num_fields = 4

data = []
for input_row in input_rows:
    if not input_row:
        continue
    data.append([float(input_row[width * i:width * (i + 1)].strip()) for i in range(num_fields)])

data = numpy.array(data)
print(data)

This outputs: 这输出:

[[  4.26170000e-04  -1.97250000e+08  -2.11390000e+08   3.70770000e+07]
 [  8.52340000e-04  -1.80310000e+08  -1.83400000e+08   2.82370000e+07]
 [  1.27850000e-03  -1.65830000e+08  -1.58870000e+08   2.06370000e+07]]

Of course, this example uses a fixed string to represent the input data, but you can imagine doing a similar thing with your input stream. 当然,此示例使用固定字符串来表示输入数据,但您可以想象对输入流执行类似操作。

The other answers use methods that rely on the fact that the numbers are of the same width or use scientific method. 其他答案使用的方法依赖于数字宽度相同或使用科学方法的事实。 Here I present a method that accepts any representation of floats, fixed width or not. 这里我提出一种方法,接受浮动的任何表示,固定宽度与否。

If you were dealing with the given input in C, you probably would have used scanf or sscanf . 如果你在C中处理给定的输入,你可能会使用scanfsscanf Python has functionalities that are similar with printf (such as the format method for strings), but it does not have anything like scanf or sscanf . Python具有与printf类似的功能(例如字符串的format方法),但它没有像scanfsscanf那样的东西。

Fortunately, you can use ctypes in the Python standard library to directly use the sscanf function. 幸运的是,您可以在Python标准库中使用ctypes直接使用sscanf函数。 The following example is for Python on Linux systems: 以下示例适用于Linux系统上的Python:

import ctypes
libc = ctypes.CDLL("libc.so.6")
sscanf = libc.sscanf
with open("test") as fp:
    for l in fp:
        float_1 = ctypes.c_float()
        float_2 = ctypes.c_float()
        float_3 = ctypes.c_float()
        float_4 = ctypes.c_float()
        sscanf(ctypes.create_string_buffer(bytes(l,"utf8")), b"%f%f%f%f", ctypes.byref(float_1), ctypes.byref(float_2),ctypes.byref(float_3),ctypes.byref(float_4))
       # You can check the return va Lue of sscanf for errors. It should return 1 when every
        print(float_1.value, float_2.value, float_3.value, float_4.value)

The output is 输出是

0.00042617000872269273 -197250000.0 -211390000.0 37077000.0
0.0008523400174453855 -180310000.0 -183400000.0 28237000.0
0.0012784999562427402 -165830000.0 -158870000.0 20637000.0

In the (unlikely) case that your system does not use glibc or uses a older version, change the path of the library accordingly. 在(不太可能)您的系统不使用glibc或使用旧版本的情况下,相应地更改库的路径。 (it is very unlikely for your system to not have a C library or for that library not to implement scanf) If you are using Windows, change libc = ctypes.CDLL("libc.so.6") to (您的系统不太可能没有C库或该库不能实现scanf)如果您使用的是Windows, libc = ctypes.CDLL("libc.so.6")更改为

libc = ctypes.cdll.msvcrt # Loads MS standard C Library

ctypes simply calls the functions in the dynamic library using the standard calling conventions. ctypes使用标准调用约定简单地调用动态库中的函数。 You can use it to interface Python code with almost any C libraries 您可以使用它将Python代码与几乎任何C库连接起来

If you do not wish to use ctypes , then you could use some community libraries, such as scanf or parse , both implements the functionalities of scanf . 如果你不想使用ctypes ,那么你可以使用一些社区库,比如scanfparse ,它们都实现了scanf的功能。

You could abuse the fact that you the numbers appear to all be in scientific notation and use regular expressions to pull each one out. 你可以滥用这样一个事实:数字看起来都是科学记数法,并使用正则表达式将每个数字拉出来。

import re

e_numbers = re.compile(r"[\d.]*?E[+-]\d{2}")

with open('yourfile.txt') as f:
    numbers = [float(num) for lst in [e_numbers.findall(line) for line in f] for num in lst]

To pull that regex out: 拉出那个正则表达式:

e_numbers = re.compile(r'''
    [\d.]*?             # zero or more of the following:
                        #   0123456789.
                        # matching the fewest possible
    E                   # the literal letter 'E'
    [+-]                # either a literal '+' or a literal '-'
    \d{2}               # followed by two digits 0-9''', re.X)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM