简体   繁体   English

如何使用Python解析.txt文件的特定部分?

[英]How do I parse specific parts of a .txt file with Python?

I have a .txt with data I need to parse to an object that should then be put in a list . 我有一个.txt其中包含我需要解析为一个object ,然后应将其放入list The .txt file i huge but here's a sample: .txt文件很大,但是这里有一个示例:

5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;
C5CA;1/3/2015 12:30:00 PM;1/3/2015 1:00:00 PM;

It's all in one chunk, separated with semicolons. 它全部用分号分隔在一起。 The sample consists of two objects with id , time_start and time_end 该示例包含两个id time_starttime_end对象

I have created a class that looks like this: 我创建了一个看起来像这样的class

class Employee:

def __init__(self, id, time_start, time_end):
    self.id = id
    self.time_start = time_start
    self.time_end = time_end

The main part looks like this, with opening the file and trying to parse it: 主要部分如下所示,打开文件并尝试对其进行解析:

my_array_with_objects = []

my_file = open("test.txt", "r")

for item in my_file:
    temp_employee = Employee()
    temp_employee.id = item_first_semicolon_part
    temp_employee.time_start = item_second_semicolon_part
    temp_employee.time_end = item_third_semicolon_part

    my_array_with_objects.append(temp_employee)

myFile.close()

So, the problem is, I don't know how to access the specific parts of the .txt file, separated by the semicolons. 因此,问题是,我不知道如何访问.txt文件的特定部分,并用分号分隔。 Obviously "item_first_semicolon_part" won't work. 显然,“ item_first_semicolon_part”不起作用。 But how do I access the first part of the text file so that I do get the id number and nothing else (and then the start and end times)? 但是,如何访问文本文件的第一部分,以便获得id号而没有其他信息(以及开始时间和结束时间)? Is there an elegant way of doing this or simply a matter of using if ; do this 是否有一种优雅的方式来做到这一点,或者仅仅是使用if ; do this if ; do this . if ; do this

Thanks in advance. 提前致谢。 I have looked through similar questions, but don't think there was anything that could help me through this. 我已经看过类似的问题,但是认为没有什么可以帮助我解决这个问题。

UPDATE I got a great answer from @Alderven that worked, but I just noticed that the parsed object cuts away a part of the id. 更新我从@Alderven那里得到了一个很好的答案,该方法有效,但是我只是注意到,已解析的对象切掉了id的一部分。 I simplified the id for the sake of this question. 为了这个问题,我简化了id。 Full id (with the rest of the data) looks like this: 完整ID(以及其余数据)如下所示:

57646786307395936680161735716561753784;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6 259939411636051033617118653993975778241;1/3/2015 12:30:00 PM;1/3/2015 1:00:00 PM;

The part: C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6 该部分: C5CAACCED1B9F361761853A7F995A1D4F16C8BCD0A5001A2DF3EC0D7CD539A09AA7DDA1A5278FA07554B0260880882CCBB30B3399C3C0974C587A8233E5788A81DEAD2921123CB12D13CC11318C38B9679D868145315F1BE24333202D12B3787E51D1BBF97BB25482B0EF7E97DE637BAACEDD74E89E2AC52139EE9369F1D64A6

seems to be missing. 似乎不见了。 Perhaps because it is on the same row as the first object. 可能是因为它与第一个对象在同一行上。 The last part of the id is still there: id的最后部分仍然存在:

259939411636051033617118653993975778241

How do I get the full id? 如何获取完整的ID?

Actually it is CSV format with ; 实际上它是CSV格式,带有; delimiter. 分隔符。 Basically: 基本上:

import csv

with open('test.txt', newline='\n') as f:
    reader = csv.reader(f, delimiter=';')
    for row in reader:
        print(row)

If you want to parse data to your Employee class, then: 如果要将数据解析到Employee类,则:

import csv

class Employee:
    def __init__(self, id, timeStart, timeEnd):
        self.id = id
        self.timeStart = timeStart
        self.timeEnd = timeEnd

myArrayWithObjects = []
with open('test.txt', newline='\n') as f:
    reader = csv.reader(f, delimiter=';')
    for row in reader:
        myArrayWithObjects.append(Employee(row[0], row[1], row[2]))

You need to split the line by ; 您需要用以下方式分隔线; using str.split : 使用str.split

>>> line = "5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;"
>>> parts = line.split(";")
>>> parts
['5764', '3/13/2015 8:00:00 AM', '3/13/2015 1:00:00 PM', '']
>>> ID = parts[0]
>>> start = parts[1]
>>> end = parts[2]
>>> ID
'5764'
>>> start
'3/13/2015 8:00:00 AM'
>>> end
'3/13/2015 1:00:00 PM'

You can shorten this by removing the last ; 您可以通过删除最后一个来缩短此时间; from the line and assigning multiple values at once: 从该行开始并立即分配多个值:

>>> line = "5764;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM;"
>>> line = line.strip(";")
>>> ID, start, end = line.split(";")
>>> ID
'5764'
>>> start
'3/13/2015 8:00:00 AM'
>>> end
'3/13/2015 1:00:00 PM'

you can split a row using item.split(';') to get turn it into a list. 您可以使用item.split(';')拆分一行以将其转换为列表。

you could also parse it as a csv into an array using csvreader or pandas, but that is a separate approach. 您也可以使用csvreader或pandas将其作为csv解析为数组,但这是一种单独的方法。

if the order is right you can unpack that directly into an employee object using tmpemployee = Employee(*item.split(';')) 如果命令正确,则可以使用tmpemployee = Employee(*item.split(';'))其直接解压缩到一个雇员对象中

You can use the csv.reader method with ; 您可以将csv.reader方法与;一起使用; as the delimiter, but slice only the first 3 items of each row since you have a redundant trailing ; 作为定界符,但由于您有多余的尾随,因此仅对每行的前3个项进行切片; on each line of the input: 在输入的每一行上:

import csv
with open("test.txt", "r") as f:
    myArrayWithObjects = [Employee(*row[:3]) for row in csv.reader(delimiter=';')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python访问txt文件中字符串的特定部分? - How can I access specific parts of a string in a txt file with python? 我需要在python中向txt文件的特定部分添加一些字符串 - I need to add some string to specific parts of a txt file in python 如何使用 Python 解析 txt 文件并从 txt 文件的特定部分创建字典? - How can I parse a txt file and create dictionary from a specific part of the txt file using Python? 如何在Python中添加到.txt文件中特定字符串的行? - How do I add to the row of a specific string in a .txt file in Python? 如何从 Python 中的 .txt 文件加载特定行? - How do I load specific rows from a .txt file in Python? 如何通过给出 python 中行的特定部分来制作与 csv 文件中的值相对应的程序? - How do I make a program that corresponds to a value in a csv file by giving out specific parts of the row in python? 删除txt文件中的特定部分 - Delete specific parts in a txt file 我如何通过python 3.0从.txt文件中解析某些数据点 - how do I parse certain data points from a .txt file through python 3.0 Python中的JSON:如何获取数组的特定部分? - JSON in Python: How do I get specific parts of an array? 如何在Python中仅将字符串的特定部分转换为大写? - How do I convert only specific parts of a string to uppercase in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM