[英]How do I parse specific parts of a .txt file with Python?
I have a .txt
with data I need to parse to an object
that should then be put in a list
. 我有一个
.txt
其中包含我需要解析为一个object
,然后应将其放入list
。 The .txt
file i huge but here's a sample: .txt
文件很大,但是这里有一个示例:
5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;
C5CA;1/3/2015 12:30:00 PM;1/3/2015 1:00:00 PM;
It's all in one chunk, separated with semicolons. 它全部用分号分隔在一起。 The sample consists of two objects with
id
, time_start
and time_end
该示例包含两个
id
time_start
和time_end
对象
I have created a class
that looks like this: 我创建了一个看起来像这样的
class
:
class Employee:
def __init__(self, id, time_start, time_end):
self.id = id
self.time_start = time_start
self.time_end = time_end
The main part looks like this, with opening the file and trying to parse it: 主要部分如下所示,打开文件并尝试对其进行解析:
my_array_with_objects = []
my_file = open("test.txt", "r")
for item in my_file:
temp_employee = Employee()
temp_employee.id = item_first_semicolon_part
temp_employee.time_start = item_second_semicolon_part
temp_employee.time_end = item_third_semicolon_part
my_array_with_objects.append(temp_employee)
myFile.close()
So, the problem is, I don't know how to access the specific parts of the .txt
file, separated by the semicolons. 因此,问题是,我不知道如何访问
.txt
文件的特定部分,并用分号分隔。 Obviously "item_first_semicolon_part" won't work. 显然,“ item_first_semicolon_part”不起作用。 But how do I access the first part of the text file so that I do get the
id
number and nothing else (and then the start and end times)? 但是,如何访问文本文件的第一部分,以便获得
id
号而没有其他信息(以及开始时间和结束时间)? Is there an elegant way of doing this or simply a matter of using if ; do this
是否有一种优雅的方式来做到这一点,或者仅仅是使用
if ; do this
if ; do this
. if ; do this
。
Thanks in advance. 提前致谢。 I have looked through similar questions, but don't think there was anything that could help me through this.
我已经看过类似的问题,但是认为没有什么可以帮助我解决这个问题。
UPDATE I got a great answer from @Alderven that worked, but I just noticed that the parsed object cuts away a part of the id. 更新我从@Alderven那里得到了一个很好的答案,该方法有效,但是我只是注意到,已解析的对象切掉了id的一部分。 I simplified the id for the sake of this question.
为了这个问题,我简化了id。 Full id (with the rest of the data) looks like this:
完整ID(以及其余数据)如下所示:
57646786307395936680161735716561753784;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6 259939411636051033617118653993975778241;1/3/2015 12:30:00 PM;1/3/2015 1:00:00 PM;
The part: C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6
该部分:
C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6
seems to be missing. 似乎不见了。 Perhaps because it is on the same row as the first object.
可能是因为它与第一个对象在同一行上。 The last part of the
id
is still there: id
的最后部分仍然存在:
259939411636051033617118653993975778241
How do I get the full id? 如何获取完整的ID?
Actually it is CSV format with ;
实际上它是CSV格式,带有
;
delimiter. 分隔符。 Basically:
基本上:
import csv
with open('test.txt', newline='\n') as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
print(row)
If you want to parse data to your Employee
class, then: 如果要将数据解析到
Employee
类,则:
import csv
class Employee:
def __init__(self, id, timeStart, timeEnd):
self.id = id
self.timeStart = timeStart
self.timeEnd = timeEnd
myArrayWithObjects = []
with open('test.txt', newline='\n') as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
myArrayWithObjects.append(Employee(row[0], row[1], row[2]))
You need to split the line by ;
您需要用以下方式分隔线
;
using str.split
: 使用
str.split
:
>>> line = "5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;"
>>> parts = line.split(";")
>>> parts
['5764', '3/13/2015 8:00:00 AM', '3/13/2015 1:00:00 PM', '']
>>> ID = parts[0]
>>> start = parts[1]
>>> end = parts[2]
>>> ID
'5764'
>>> start
'3/13/2015 8:00:00 AM'
>>> end
'3/13/2015 1:00:00 PM'
You can shorten this by removing the last ;
您可以通过删除最后一个来缩短此时间
;
from the line and assigning multiple values at once: 从该行开始并立即分配多个值:
>>> line = "5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;"
>>> line = line.strip(";")
>>> ID, start, end = line.split(";")
>>> ID
'5764'
>>> start
'3/13/2015 8:00:00 AM'
>>> end
'3/13/2015 1:00:00 PM'
you can split a row using item.split(';')
to get turn it into a list. 您可以使用
item.split(';')
拆分一行以将其转换为列表。
you could also parse it as a csv into an array using csvreader or pandas, but that is a separate approach. 您也可以使用csvreader或pandas将其作为csv解析为数组,但这是一种单独的方法。
if the order is right you can unpack that directly into an employee object using tmpemployee = Employee(*item.split(';'))
如果命令正确,则可以使用
tmpemployee = Employee(*item.split(';'))
其直接解压缩到一个雇员对象中
You can use the csv.reader
method with ;
您可以将
csv.reader
方法与;
一起使用;
as the delimiter, but slice only the first 3 items of each row since you have a redundant trailing ;
作为定界符,但由于您有多余的尾随,因此仅对每行的前3个项进行切片
;
on each line of the input: 在输入的每一行上:
import csv
with open("test.txt", "r") as f:
myArrayWithObjects = [Employee(*row[:3]) for row in csv.reader(delimiter=';')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.