如何使用Python解析.txt文件的特定部分？

Question

I have a .txt with data I need to parse to an object that should then be put in a list . 我有一个.txt其中包含我需要解析为一个object ，然后应将其放入list 。 The .txt file i huge but here's a sample: .txt文件很大，但是这里有一个示例：

5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;
C5CA;1/3/2015 12:30:00 PM;1/3/2015 1:00:00 PM;

It's all in one chunk, separated with semicolons. 它全部用分号分隔在一起。 The sample consists of two objects with id , time_start and time_end 该示例包含两个id time_start和time_end对象

I have created a class that looks like this: 我创建了一个看起来像这样的class ：

class Employee:

def __init__(self, id, time_start, time_end):
    self.id = id
    self.time_start = time_start
    self.time_end = time_end

The main part looks like this, with opening the file and trying to parse it: 主要部分如下所示，打开文件并尝试对其进行解析：

my_array_with_objects = []

my_file = open("test.txt", "r")

for item in my_file:
    temp_employee = Employee()
    temp_employee.id = item_first_semicolon_part
    temp_employee.time_start = item_second_semicolon_part
    temp_employee.time_end = item_third_semicolon_part

    my_array_with_objects.append(temp_employee)

myFile.close()

So, the problem is, I don't know how to access the specific parts of the .txt file, separated by the semicolons. 因此，问题是，我不知道如何访问.txt文件的特定部分，并用分号分隔。 Obviously "item_first_semicolon_part" won't work. 显然，“ item_first_semicolon_part”不起作用。 But how do I access the first part of the text file so that I do get the id number and nothing else (and then the start and end times)? 但是，如何访问文本文件的第一部分，以便获得id号而没有其他信息（以及开始时间和结束时间）？ Is there an elegant way of doing this or simply a matter of using if ; do this 是否有一种优雅的方式来做到这一点，或者仅仅是使用if ; do this if ; do this . if ; do this 。

Thanks in advance. 提前致谢。 I have looked through similar questions, but don't think there was anything that could help me through this. 我已经看过类似的问题，但是认为没有什么可以帮助我解决这个问题。

UPDATE I got a great answer from @Alderven that worked, but I just noticed that the parsed object cuts away a part of the id. 更新我从@Alderven那里得到了一个很好的答案，该方法有效，但是我只是注意到，已解析的对象切掉了id的一部分。 I simplified the id for the sake of this question. 为了这个问题，我简化了id。 Full id (with the rest of the data) looks like this: 完整ID（以及其余数据）如下所示：

57646786307395936680161735716561753784;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6 259939411636051033617118653993975778241;1/3/2015 12:30:00 PM;1/3/2015 1:00:00 PM;

The part: C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6 该部分： C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6

seems to be missing. 似乎不见了。 Perhaps because it is on the same row as the first object. 可能是因为它与第一个对象在同一行上。 The last part of the id is still there: id的最后部分仍然存在：

259939411636051033617118653993975778241

How do I get the full id? 如何获取完整的ID？

Answer 1

Actually it is CSV format with ; 实际上它是CSV格式，带有; delimiter. 分隔符。 Basically: 基本上：

import csv

with open('test.txt', newline='\n') as f:
    reader = csv.reader(f, delimiter=';')
    for row in reader:
        print(row)

If you want to parse data to your Employee class, then: 如果要将数据解析到Employee类，则：

import csv

class Employee:
    def __init__(self, id, timeStart, timeEnd):
        self.id = id
        self.timeStart = timeStart
        self.timeEnd = timeEnd

myArrayWithObjects = []
with open('test.txt', newline='\n') as f:
    reader = csv.reader(f, delimiter=';')
    for row in reader:
        myArrayWithObjects.append(Employee(row[0], row[1], row[2]))

Answer 2

You need to split the line by ; 您需要用以下方式分隔线; using str.split : 使用str.split ：

>>> line = "5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;"
>>> parts = line.split(";")
>>> parts
['5764', '3/13/2015 8:00:00 AM', '3/13/2015 1:00:00 PM', '']
>>> ID = parts[0]
>>> start = parts[1]
>>> end = parts[2]
>>> ID
'5764'
>>> start
'3/13/2015 8:00:00 AM'
>>> end
'3/13/2015 1:00:00 PM'

You can shorten this by removing the last ; 您可以通过删除最后一个来缩短此时间; from the line and assigning multiple values at once: 从该行开始并立即分配多个值：

>>> line = "5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;"
>>> line = line.strip(";")
>>> ID, start, end = line.split(";")
>>> ID
'5764'
>>> start
'3/13/2015 8:00:00 AM'
>>> end
'3/13/2015 1:00:00 PM'

Answer 3

you can split a row using item.split(';') to get turn it into a list. 您可以使用item.split(';')拆分一行以将其转换为列表。

you could also parse it as a csv into an array using csvreader or pandas, but that is a separate approach. 您也可以使用csvreader或pandas将其作为csv解析为数组，但这是一种单独的方法。

if the order is right you can unpack that directly into an employee object using tmpemployee = Employee(*item.split(';')) 如果命令正确，则可以使用tmpemployee = Employee(*item.split(';'))其直接解压缩到一个雇员对象中

Answer 4

You can use the csv.reader method with ; 您可以将csv.reader方法与;一起使用; as the delimiter, but slice only the first 3 items of each row since you have a redundant trailing ; 作为定界符，但由于您有多余的尾随，因此仅对每行的前3个项进行切片; on each line of the input: 在输入的每一行上：

import csv
with open("test.txt", "r") as f:
    myArrayWithObjects = [Employee(*row[:3]) for row in csv.reader(delimiter=';')]

如何使用Python解析.txt文件的特定部分？

问题描述

4 个解决方案

解决方案1
1 已采纳 2019-02-26 14:39:44

解决方案2
0 2019-02-26 14:37:16

解决方案3
0 2019-02-26 14:37:29

解决方案4
0 2019-02-26 14:43:28

如何使用Python解析.txt文件的特定部分？

问题描述

4 个解决方案

解决方案1 1 已采纳 2019-02-26 14:39:44

解决方案2 0 2019-02-26 14:37:16

解决方案3 0 2019-02-26 14:37:29

解决方案4 0 2019-02-26 14:43:28

解决方案1
1 已采纳 2019-02-26 14:39:44

解决方案2
0 2019-02-26 14:37:16

解决方案3
0 2019-02-26 14:37:29

解决方案4
0 2019-02-26 14:43:28