In my data.txt file, there are 2 types of lines.
Normal data: 16 numbers separated by spaces with a '\\n' appended at the end.
Incomplete data: In the process of writing the data into data.txt, the writing-in of the last line is always interrupted by the STOP command. Thus, it is always incomplete, egit can have 10 numbers and no '\\n'
Two questions:
a. How can I import the whole file EXCEPT the last incomplete line into Python?
I notice that
# Load the .txt file in
myData = np.loadtxt('twenty_z_up.txt')
is quite "strict" in the sense that when the last incomplete line exists there, the file cannot be imported. The imported .txt file has to be a nice matrix.
b. Occasionally, I make timestamps on the first entry of a line for experiment purpose. Say I have my 1st timestamp at the start of line 2, and my 2nd stamp at the start of line 5. How can I import only from line 2 to line 5 into Python?
=============================== Updates: Qa is solved ================================
myData = np.genfromtxt('fast_walking_pocket.txt', skip_footer=1)
will help discard the final incomplete row
You can try pandas which provides a use function read_csv to load the data more easily.
Example data:
a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j
For your Q1, you can load the data by:
In [27]: import pandas as pd
In [28]: df = pd.read_csv('test.txt', sep=' ', header=None, skipfooter=1)
DataFrame is a useful structure which can help you to process data easier. To get a numpy array, simply get the values
attribute of the DataFrame
.
In [33]: df.values
Out[33]:
array([['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p'],
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p'],
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p'],
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p'],
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p']], dtype=object)
For your Q2, you can get the second and the fifth line by
In [36]: df.ix[[1, 4]]
Out[36]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 a b c d e f g h i j k l m n o p
4 a b c d e f g h i j k l m n o p
To answer your 'b' question.
Assume you have this file (called '/tmp/lines.txt'):
line 1
2013:10:15
line 3
line 4
2010:8:15
line 6
You can use the linecache module:
>>> import linecache
>>> linecache.getline('/tmp/lines.txt', 2)
'2013:10:15\n'
So you can parse this time directly:
>>> import datetime as dt
>>>dt.datetime.strptime(linecache.getline('/tmp/lines.txt',2).strip(),'%Y:%m:%d')
datetime.datetime(2013, 10, 15, 0, 0)
Edit
Multiple lines:
>>> li=[]
>>> for i in (2,5):
... li.append(linecache.getline('/tmp/lines.txt', i).strip())
...
>>> li
['2013:10:15', '2010:8:15']
Or:
>>> lines={}
>>> for i in (2,5):
... lines[i]=linecache.getline('/tmp/lines.txt', i).strip()
...
>>> lines
{2: '2013:10:15', 5: '2010:8:15'}
Or a range:
>>> lines={}
>>> for i in range(2,6):
... lines[i]=linecache.getline('/tmp/lines.txt', i).strip()
...
>>> lines
{2: '2013:10:15', 3: 'line 3', 4: 'line 4', 5: '2010:8:15'}
Question a:
np.genfromtxt('twenty_z_up.txt',skip_footer=1)
Qustion b:
np.genfromtxt('twenty_z_up.txt',skip_footer=1)[2:5]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.