简体   繁体   中英

Looping through lines in a text file with python

I receive text files that are structured like this:

random
----new data-----
06/19/2018 13:57:39.99 random information here
06/19/2018 13:58:24.99 some more random info
06/19/2018 13:58:35.08  00:00:04.38 A 00000 0 765 228270257 A0   44    45
06/19/2018 13:58:39.99  00:00:00.00 A 00000 0 756 228270257 A0    4     5
06/19/2018 13:58:40.61  00:00:00.00 A 00000 0 828 228270257 A0    1     7
06/19/2018 13:57:39.99 random information here
06/19/2018 13:58:24.99 some more random info
---end data---
random stuff

There are several lines with random information surrounding the actual data I care about. I only want to keep the rows that have A in the fourth row, and then I want to turn the data into a CSV file.

Assuming the data above is in play.txt , I have tried several variants of this. which isn't working:

import csv
import pandas as pd
from io import StringIO

id = []
with open('play.txt', 'r') as fi:
    for ln in fi:
        if ln.startswith("A",4):
            id.append(ln[0:])


id2 = ' '.join(id)
df = pd.read_table(StringIO(id2), delimiter=r'\s+', header=None)


print(df)
                   
df.to_csv('out.csv')

How can this be done in python?

Use the following:

with open('play.txt', 'r') as fi:
    for line in fi:
        line = line.split(" ") 
        # you can also use line.split() to split 
        # the line by all whitespace.
        if (len(line)>=4 and line[3]=="A"):
            ...

This splits by the spaces, and then you can use the list indexing.

Why ln.startswith("A",4) doesn't work

That code doesn't work for 2 main reasons.

  1. Python starts 0 indexed, so if you were looking for the 4th column, you would write ln.startswith("A", 3)
  2. ln.startswith("A", 3) gets the literal 4th character in the string. Python reads the line in as a string of characters, which consists of the text that you have. So, using ln.startswith("A", 3) gets the 4th character, which, in all of the lines, is the character "1".
# read the file
file = open('play.txt').read()

id = []

# loop through the file and if the fourth word is 'A' then append that line to 'id'
for line in file.splitlines():
    if line.split()[3] == 'A':
        id.append(line.split())

# save to a dataframe
df = pd.DataFrame(id)
df
    0           1           2           3   4       5   6   7           8   9   10
0   06/19/2018  13:58:35.08 00:00:04.38 A   00000   0   765 228270257   A0  44  45
1   06/19/2018  13:58:39.99 00:00:00.00 A   00000   0   756 228270257   A0  4   5
2   06/19/2018  13:58:40.61 00:00:00.00 A   00000   0   828 228270257   A0  1   7

# if you want specify column names too 
# df = pd.DataFrame(id, columns=['col_name_1', 'col_name_2'... ])

# save to csv
df.to_csv('out.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM