使用 python 循環遍歷文本文件中的行

Question

我收到的文本文件結構如下：

random
----new data-----
06/19/2018 13:57:39.99 random information here
06/19/2018 13:58:24.99 some more random info
06/19/2018 13:58:35.08  00:00:04.38 A 00000 0 765 228270257 A0   44    45
06/19/2018 13:58:39.99  00:00:00.00 A 00000 0 756 228270257 A0    4     5
06/19/2018 13:58:40.61  00:00:00.00 A 00000 0 828 228270257 A0    1     7
06/19/2018 13:57:39.99 random information here
06/19/2018 13:58:24.99 some more random info
---end data---
random stuff

我關心的實際數據周圍有幾行隨機信息。 我只想保留第四行有A的行，然后我想把數據變成CSV文件。

假設上面的數據在play.txt中，我已經嘗試了幾種變體。 這不起作用：

import csv
import pandas as pd
from io import StringIO

id = []
with open('play.txt', 'r') as fi:
    for ln in fi:
        if ln.startswith("A",4):
            id.append(ln[0:])


id2 = ' '.join(id)
df = pd.read_table(StringIO(id2), delimiter=r'\s+', header=None)


print(df)
                   
df.to_csv('out.csv')

如何在 python 中做到這一點？

Answer 1

使用以下內容：

with open('play.txt', 'r') as fi:
    for line in fi:
        line = line.split(" ") 
        # you can also use line.split() to split 
        # the line by all whitespace.
        if (len(line)>=4 and line[3]=="A"):
            ...

這按空格分割，然后您可以使用列表索引。

為什么ln.startswith("A",4)不起作用

該代碼不起作用有兩個主要原因。

Python 從 0 開始索引，所以如果你正在尋找第 4 列，你會寫ln.startswith("A", 3)
ln.startswith("A", 3)獲取字符串中的第 4 個字符。 Python 將該行讀取為字符串，其中包含您擁有的文本。 因此，使用ln.startswith("A", 3)得到第 4 個字符，在所有行中，它都是字符“1”。

Answer 2

# read the file
file = open('play.txt').read()

id = []

# loop through the file and if the fourth word is 'A' then append that line to 'id'
for line in file.splitlines():
    if line.split()[3] == 'A':
        id.append(line.split())

# save to a dataframe
df = pd.DataFrame(id)
df
    0           1           2           3   4       5   6   7           8   9   10
0   06/19/2018  13:58:35.08 00:00:04.38 A   00000   0   765 228270257   A0  44  45
1   06/19/2018  13:58:39.99 00:00:00.00 A   00000   0   756 228270257   A0  4   5
2   06/19/2018  13:58:40.61 00:00:00.00 A   00000   0   828 228270257   A0  1   7

# if you want specify column names too 
# df = pd.DataFrame(id, columns=['col_name_1', 'col_name_2'... ])

# save to csv
df.to_csv('out.csv')

使用 python 循環遍歷文本文件中的行

問題描述

2 個解決方案

解決方案1
0 2021-04-26 16:00:25

解決方案2
0 已采納 2021-04-26 16:17:28

使用 python 循環遍歷文本文件中的行

問題描述

2 個解決方案

解決方案1 0 2021-04-26 16:00:25

解決方案2 0 已采納 2021-04-26 16:17:28

解決方案1
0 2021-04-26 16:00:25

解決方案2
0 已采納 2021-04-26 16:17:28