简体   繁体   English

使用 python 循环遍历文本文件中的行

[英]Looping through lines in a text file with python

I receive text files that are structured like this:我收到的文本文件结构如下:

random
----new data-----
06/19/2018 13:57:39.99 random information here
06/19/2018 13:58:24.99 some more random info
06/19/2018 13:58:35.08  00:00:04.38 A 00000 0 765 228270257 A0   44    45
06/19/2018 13:58:39.99  00:00:00.00 A 00000 0 756 228270257 A0    4     5
06/19/2018 13:58:40.61  00:00:00.00 A 00000 0 828 228270257 A0    1     7
06/19/2018 13:57:39.99 random information here
06/19/2018 13:58:24.99 some more random info
---end data---
random stuff

There are several lines with random information surrounding the actual data I care about.我关心的实际数据周围有几行随机信息。 I only want to keep the rows that have A in the fourth row, and then I want to turn the data into a CSV file.我只想保留第四行有A的行,然后我想把数据变成CSV文件。

Assuming the data above is in play.txt , I have tried several variants of this.假设上面的数据在play.txt中,我已经尝试了几种变体。 which isn't working:这不起作用:

import csv
import pandas as pd
from io import StringIO

id = []
with open('play.txt', 'r') as fi:
    for ln in fi:
        if ln.startswith("A",4):
            id.append(ln[0:])


id2 = ' '.join(id)
df = pd.read_table(StringIO(id2), delimiter=r'\s+', header=None)


print(df)
                   
df.to_csv('out.csv')

How can this be done in python?如何在 python 中做到这一点?

Use the following:使用以下内容:

with open('play.txt', 'r') as fi:
    for line in fi:
        line = line.split(" ") 
        # you can also use line.split() to split 
        # the line by all whitespace.
        if (len(line)>=4 and line[3]=="A"):
            ...

This splits by the spaces, and then you can use the list indexing.这按空格分割,然后您可以使用列表索引。

Why ln.startswith("A",4) doesn't work为什么ln.startswith("A",4)不起作用

That code doesn't work for 2 main reasons.该代码不起作用有两个主要原因。

  1. Python starts 0 indexed, so if you were looking for the 4th column, you would write ln.startswith("A", 3) Python 从 0 开始索引,所以如果你正在寻找第 4 列,你会写ln.startswith("A", 3)
  2. ln.startswith("A", 3) gets the literal 4th character in the string. ln.startswith("A", 3)获取字符串中的第 4 个字符。 Python reads the line in as a string of characters, which consists of the text that you have. Python 将该行读取为字符串,其中包含您拥有的文本。 So, using ln.startswith("A", 3) gets the 4th character, which, in all of the lines, is the character "1".因此,使用ln.startswith("A", 3)得到第 4 个字符,在所有行中,它都是字符“1”。
# read the file
file = open('play.txt').read()

id = []

# loop through the file and if the fourth word is 'A' then append that line to 'id'
for line in file.splitlines():
    if line.split()[3] == 'A':
        id.append(line.split())

# save to a dataframe
df = pd.DataFrame(id)
df
    0           1           2           3   4       5   6   7           8   9   10
0   06/19/2018  13:58:35.08 00:00:04.38 A   00000   0   765 228270257   A0  44  45
1   06/19/2018  13:58:39.99 00:00:00.00 A   00000   0   756 228270257   A0  4   5
2   06/19/2018  13:58:40.61 00:00:00.00 A   00000   0   828 228270257   A0  1   7

# if you want specify column names too 
# df = pd.DataFrame(id, columns=['col_name_1', 'col_name_2'... ])

# save to csv
df.to_csv('out.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM