简体   繁体   中英

python: loop through txt files and delete first few rows of strings

I have text files where each file has 90 columns of timeseries data that vary in length. Before these 90 columns, there are 6 rows of junk string data I want to delete. From row 7 onward, the data is all of the type float.

I have tried the following but it made no change to my files:

folder = '/Users/LR/Desktop/S2'
files = os.listdir(folder)
for filename in files:
       lines = open(filename).readlines()
       open(filename, 'w').writelines(lines[6:])

I also tried loading the files and skipping over the first 6 rows, but numpy.loadtxt doesn't work unless I set the dtype = 'str'. It successfully cuts out the first 6 rows.. but it imports as a string ndarray object and I can't figure out how to convert it to a float array.

data = np.loadtxt('STS2.txt', delimiter = '\t', skiprows=6, dtype='str')
data = data.astype(float) # this gives the error: ValueError: could not convert string to float: 

When I set the dtype = float, I get the same ValueError:

data_float = np.loadtxt('STS2.txt', delimiter='\t', dtype=float, skiprows=7) # this gives the error: ValueError: could not convert string to float: 

Anyone know a way to solve this problem?

You could use pandas to help you. Using the code below:

import pandas as pd
import numpy as np

df = pd.read_csv('STS1.txt', delimiter='\t', skiprows=[0,1,2], index_col=0)
df = df.T.set_index(np.nan, append=True).T

I was able to load the following table:

在此处输入图片说明

Note that your columns are now hierarchical. You can check your types:

df.dtypes

Output:

1      float64
2      float64
3      float64
4      float64
...

You can also convert the data easily eg to int :

df = df.fillna(0).astype(int)

The last field of each row is an empty string, so numpy is unable to parse it as a float . You're only interested in the first 90 columns anyway, so add usecols=range(90) :

np.loadtxt('STS2.txt', skiprows=6, usecols=range(90))

(Of course, if you've already chopped off those first six rows, you can now drop the skiprows=6 .)

EDIT

Since the first column just seems to be an index, you could use usecols=range(1, 90) to ignore it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM