numpy-Python-选择性导入.txt文件的一部分

Question

In my data.txt file, there are 2 types of lines. 在我的data.txt文件中，有2种类型的行。

Normal data: 16 numbers separated by spaces with a '\\n' appended at the end. 普通数据：16个数字，中间用空格分隔，并在末尾添加“ \\ n”。
Incomplete data: In the process of writing the data into data.txt, the writing-in of the last line is always interrupted by the STOP command. 数据不完整：在将数据写入data.txt的过程中，最后一行的写入始终被STOP命令中断。 Thus, it is always incomplete, egit can have 10 numbers and no '\\n' 因此，它始终是不完整的，例如它可以有10个数字并且没有'\\ n'

Two questions: 两个问题：

a. 一种。 How can I import the whole file EXCEPT the last incomplete line into Python? 除了最后一个不完整的行，如何将整个文件导入Python？

I notice that 我注意到

# Load the .txt file in
myData = np.loadtxt('twenty_z_up.txt')

is quite "strict" in the sense that when the last incomplete line exists there, the file cannot be imported. 在存在最后一个不完整的行的情况下，该文件是“严格的”，无法导入文件。 The imported .txt file has to be a nice matrix. 导入的.txt文件必须是一个不错的矩阵。

b. 湾 Occasionally, I make timestamps on the first entry of a line for experiment purpose. 有时，出于实验目的，我会在一行的第一个条目上加上时间戳。 Say I have my 1st timestamp at the start of line 2, and my 2nd stamp at the start of line 5. How can I import only from line 2 to line 5 into Python? 假设我在第2行的开始处有我的第一个时间戳，在第5行的开始处有我的第二个戳记。 我如何仅从第2行到第5行导入Python？

=============================== Updates: Qa is solved ================================ ==============================更新：Qa已解决=============== ==================

myData = np.genfromtxt('fast_walking_pocket.txt', skip_footer=1)

will help discard the final incomplete row 将有助于丢弃最后的不完整行

Answer 1

You can try pandas which provides a use function read_csv to load the data more easily. 你可以尝试大熊猫提供一个使用功能read_csv更容易地加载数据。

Example data: 示例数据：

a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j k l m n o p
a b c d e f g h i j

For your Q1, you can load the data by: 对于Q1，您可以通过以下方式加载数据：

In [27]: import pandas as pd

In [28]: df = pd.read_csv('test.txt', sep=' ', header=None, skipfooter=1)

DataFrame is a useful structure which can help you to process data easier. DataFrame是有用的结构，可以帮助您更轻松地处理数据。 To get a numpy array, simply get the values attribute of the DataFrame . 要获得一个numpy数组，只需获取DataFrame的values属性。

In [33]: df.values
Out[33]: 
array([['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p'],
       ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p'],
       ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p'],
       ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p'],
       ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p']], dtype=object)

For your Q2, you can get the second and the fifth line by 对于第二季度，您可以通过

In [36]: df.ix[[1, 4]]
Out[36]:
  0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
1  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p
4  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p

Answer 2

To answer your 'b' question. 回答您的“ b”问题。

Assume you have this file (called '/tmp/lines.txt'): 假设您有以下文件（称为“ /tmp/lines.txt”）：

line 1
2013:10:15
line 3
line 4
2010:8:15
line 6

You can use the linecache module: 您可以使用linecache模块：

>>> import linecache
>>> linecache.getline('/tmp/lines.txt', 2)
'2013:10:15\n'

So you can parse this time directly: 因此，您可以直接对此时间进行解析：

>>> import datetime as dt
>>>dt.datetime.strptime(linecache.getline('/tmp/lines.txt',2).strip(),'%Y:%m:%d')
datetime.datetime(2013, 10, 15, 0, 0)

Edit 编辑

Multiple lines: 多行：

>>> li=[]
>>> for i in (2,5):
...    li.append(linecache.getline('/tmp/lines.txt', i).strip())
... 
>>> li
['2013:10:15', '2010:8:15']

Or: 要么：

>>> lines={}
>>> for i in (2,5):
...    lines[i]=linecache.getline('/tmp/lines.txt', i).strip()
... 
>>> lines
{2: '2013:10:15', 5: '2010:8:15'}

Or a range: 或范围：

>>> lines={}
>>> for i in range(2,6):
...    lines[i]=linecache.getline('/tmp/lines.txt', i).strip()
... 
>>> lines
{2: '2013:10:15', 3: 'line 3', 4: 'line 4', 5: '2010:8:15'}

Answer 3

Question a: 问题一：

np.genfromtxt('twenty_z_up.txt',skip_footer=1)

Qustion b: 问题b：

np.genfromtxt('twenty_z_up.txt',skip_footer=1)[2:5]

numpy-Python-选择性导入.txt文件的一部分

问题描述

3 个解决方案

解决方案1
3 2013-05-29 03:31:24

解决方案2
1 已采纳 2013-05-29 03:30:10

解决方案3
1 2013-05-31 14:02:49

numpy-Python-选择性导入.txt文件的一部分

问题描述

3 个解决方案

解决方案1 3 2013-05-29 03:31:24

解决方案2 1 已采纳 2013-05-29 03:30:10

解决方案3 1 2013-05-31 14:02:49

解决方案1
3 2013-05-29 03:31:24

解决方案2
1 已采纳 2013-05-29 03:30:10

解决方案3
1 2013-05-31 14:02:49