I have text files where each file has 90 columns of timeseries data that vary in length. Before these 90 columns, there are 6 rows of junk string data I want to delete. From row 7 onward, the data is all of the type float.
I have tried the following but it made no change to my files:
folder = '/Users/LR/Desktop/S2'
files = os.listdir(folder)
for filename in files:
lines = open(filename).readlines()
open(filename, 'w').writelines(lines[6:])
I also tried loading the files and skipping over the first 6 rows, but numpy.loadtxt doesn't work unless I set the dtype = 'str'. It successfully cuts out the first 6 rows.. but it imports as a string ndarray object and I can't figure out how to convert it to a float array.
data = np.loadtxt('STS2.txt', delimiter = '\t', skiprows=6, dtype='str')
data = data.astype(float) # this gives the error: ValueError: could not convert string to float:
When I set the dtype = float, I get the same ValueError:
data_float = np.loadtxt('STS2.txt', delimiter='\t', dtype=float, skiprows=7) # this gives the error: ValueError: could not convert string to float:
Anyone know a way to solve this problem?
You could use pandas
to help you. Using the code below:
import pandas as pd
import numpy as np
df = pd.read_csv('STS1.txt', delimiter='\t', skiprows=[0,1,2], index_col=0)
df = df.T.set_index(np.nan, append=True).T
I was able to load the following table:
Note that your columns are now hierarchical. You can check your types:
df.dtypes
Output:
1 float64
2 float64
3 float64
4 float64
...
You can also convert the data easily eg to int
:
df = df.fillna(0).astype(int)
The last field of each row is an empty string, so numpy
is unable to parse it as a float
. You're only interested in the first 90 columns anyway, so add usecols=range(90)
:
np.loadtxt('STS2.txt', skiprows=6, usecols=range(90))
(Of course, if you've already chopped off those first six rows, you can now drop the skiprows=6
.)
EDIT
Since the first column just seems to be an index, you could use usecols=range(1, 90)
to ignore it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.