简体   繁体   中英

Reading data from large text file with strings and floats in Python

I'm having trouble reading large amounts of data from a text file, and splitting and removing certain objects from it to get a more refined list. For example, let's say I have a text file, we'll call it 'data.txt', that has this data in it.

Some Header Here
Object Number = 1
Object Symbol = A
Mass of Object = 1
Weight of Object = 1.2040
Hight of Object = 0.394
Width of Object = 4.2304

Object Number = 2
Object Symbol = B
Mass Number = 2
Weight of Object = 1.596
Height of Object = 3.293
Width of Object = 4.654
.
.
. ...Same format continuing down

My problem is taking the data I need from this file. Let's say I'm only interested in the Object Number and Mass of Object, which repeats through the file, but with different numerical values. I need a list of this data. Example

Object Number    Mass of Object
1                1
2                2
.                .
.                .
.                .
etc.

With the headers excluded of course, as this data will be applied to an equation. I'm very new to Python, and don't have any knowledge of OOP. What would be the easiest way to do this? I know the basics of opening and writing to text files, even a little bit of using the split and strip functions. I've researched quite a bit on this site about sorting data, but I can't get it to work for me.

Try this:

object_number = [] # list of Object Number
mass_of_object = [] # list of Mass of Object
with open('data.txt') as f:
    for line in f:
        if line.startswith('Object Number'):
            object_number.append(int(line.split('=')[1]))
        elif line.startswith('Mass of Object'):
            mass_of_object.append(int(line.split('=')[1]))

In my opinion dictionary (and sub-classes) has an efficiency greater than a group of lists for huge data input.

Moreover, my code don't need any modification if you need to extract a new object data from your file.

from _collections import defaultdict

checklist = ["Object Number", "Mass of Object"]
data = dict()

with open("text.txt") as f:
    # iterating over the file allows
    # you to read it automatically one line at a time
    for line in f:
        for regmatch in checklist:
            if line.startswith(regmatch):
                # this is to erase newline characters
                val = line.rstrip()
                val = val.split(" = ")[1]
                data.setdefault(regmatch, []).append(val)                    

print data

This is the output:

defaultdict(None, {'Object Number': ['1', '2'], 'Mass of Object': ['1']})

Here some theory about speed, here some tips about performance optimization and here about dependency between type of data and implementation efficiency.

Last, some examples about re (regular expression):

https://docs.python.org/2/howto/regex.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM