简体   繁体   中英

Numpy read variable amount of columns from a text file into an array

My file is formatted like this:

  2106   2002   27   26   1
 1   0.000000  0.000000 
 2   0.389610  0.000000 
 3   0.779221  0.000000 
 4   1.168831  0.000000 
 5   1.558442  0.000000 
 6   1.948052  0.000000 
 7   2.337662  0.000000 
 8   2.727273  0.000000 
 9   3.116883  0.000000 
 10   3.506494  0.000000 

I want to read in these. There are more rows than this and some only have two columns. In MATLAB I use readmatrix() and it works well, does Python have anything comparable? Because python genfromtxt() and python loadtxt do not work with a variable number of columns.

Should I just stick with MATLAB since Python seems to be missing key functionality like this?

Edit: Here is the output that I get in matlab that I would like in numpy:

2106    2002    27  26  1   0
1   0   0   0   0   0
2   0.389610000000000   0   0   0   0
3   0.779221000000000   0   0   0   0
4   1.16883100000000    0   0   0   0
5   1.55844200000000    0   0   0   0
6   1.94805200000000    0   0   0   0
7   2.33766200000000    0   0   0   0
8   2.72727300000000    0   0   0   0
9   3.11688300000000    0   0   0   0
10  3.50649400000000    0   0   0   0
import numpy as np

headers = []
rows = []
with open("test.txt", 'r') as file:
    for i, v in enumerate(file.readlines()):
        if i == 0:
            headers.extend(v.split())
        else:
            rows.append(v.split())
for i, v in enumerate(rows):
    while len(v) != len(headers):
        v.append(0)
    rows[i] = v
rows = np.array(rows)

let me know if any modifications are needed

You have missing values in your columns that matlab interprets them as 0. You can import similar structure to pandas and pandas will have right number of columns. It interprets missing values as nan which you can later replace with 0 if you prefer that way. The only catch is have the right column number in first row. If you have 0 at the end of it, put it 0 instead of space:

df = pd.read_csv('file.csv', sep='\s+').fillna(0)

output:

   2106      2002   27   26    1    0
0     1  0.000000  0.0  0.0  0.0  0.0
1     2  0.389610  0.0  0.0  0.0  0.0
2     3  0.779221  0.0  0.0  0.0  0.0
3     4  1.168831  0.0  0.0  0.0  0.0
4     5  1.558442  0.0  0.0  0.0  0.0
5     6  1.948052  0.0  0.0  0.0  0.0
6     7  2.337662  0.0  0.0  0.0  0.0
7     8  2.727273  0.0  0.0  0.0  0.0
8     9  3.116883  0.0  0.0  0.0  0.0
9    10  3.506494  0.0  0.0  0.0  0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM