简体   繁体   中英

List to numpy array in scientific notation

I have data files that consist of a few lines of header and matrices of Nx4 size. I want to read this file starting from the matrix and save it to a variable as numpy array. These files are ~300 MB each, but an example file looks like this:

# Some header line
    Not all header lines start with a special character
# -- a keyword --
 7.3533498487067E-03 0.0000000000000E+00 1.5509636485369E-25-2.0531419826552E-27
 1.7232929428188E-25 1.3463226115772E-28 1.7232929428188E-25 1.3463226115772E-28
 4.4805616513289E-25 7.5394066248323E-26 6.7208424769933E-25 1.1093698319396E-25
-6.4623485355705E-25-1.1924016124944E-25-5.6007020641611E-25-5.6915788404426E-26

If the value is positive, there is a single space, but if it's negative, there's no space. So far I tried:

matrix = []
with open('test.txt') as data:
    for line in data.readlines()[3:]: # I always know how many header lines should be skipped.
        matrix.append(line) # Saves all matrix elements into a list.
    matrix = ' '.join([i for item in matrix for i in item.split()]) # Combines all matrix elements into a single string with correct single space separation.
    matrix = np.fromstring(matrix, sep=' ') # This was supposed to convert the string into a 2D numpy array.

This code produce the error:

'DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.'

I think it fails to read the scientific notation (this is probably wrong), but I don't know how to fix it. Also, I think I'm making it way longer than it should be, by converting it from list to str to numpy. How can I make this work with numpy? Pandas solutions are also appreciated.

Extra: I'd appreciate any solution that can get rid of header lines without creating/copying to any new files. But this is not essential.

Apparently the point of the format is that the character length of every number is always the same, so you could exploit that:

matrix = []
with open('test.txt') as data:
    for line in data.readlines()[3:]: 
        matrix.append([float(line[i : i + 20]) for i in (0, 20, 40, 60)]) 
    
matrix = np.array(matrix)
print(matrix)
[[ 7.35334985e-03  0.00000000e+00  1.55096365e-25 -2.05314198e-27]
 [ 1.72329294e-25  1.34632261e-28  1.72329294e-25  1.34632261e-28]
 [ 4.48056165e-25  7.53940662e-26  6.72084248e-25  1.10936983e-25]
 [-6.46234854e-25 -1.19240161e-25 -5.60070206e-25 -5.69157884e-26]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM