简体   繁体   中英

Python, reading lines from a file one by one

im making a program wich sorts out valid and invalid social security numbers.

The program is supposed to be able to sort numbers from a textfile on my computer. However im only able to input all the numbers at once (i think). i wont the program to check the numbers one by one.

this is how it looks right now

def fileinput():
    try:
        textfile = open("numberlist.txt","r")
        socialsecuritynumber = textfile.read()
        numberprogram(socialsecuritynumber)
    except IOError:
        print("there's no such file!\n")

anyone know how im supposed to do this? the textfile just contains numbers

  • 1993-06-11 5570
  • 930611-5570
  • 930611 5570
  • 93 05115570
  • 1993 05 11 55 70
  • 1993 05 11 5570

this is numbers from my textfile

  1. Always read files with with statement. So, if there is a problem during read, or there is an exception in the code block, file will be closed automatically.
  2. Then use a for loop to read line by line like this

     with open("numberlist.txt","r") as textfile: for line in textfile: print line 

Use with as thefourtheye suggested. You can use the readLines() method and iterate over the lines one by one using a for-in loop and check for the validity. This will ensure that even for large files, your code doesn't break.

with open("numberlist.txt") as f: # this auto closes the file after reading. It's good practice
    numbers = f.readlines() # numbers is a list of all the numbers(a list of lines in the file)

if there are unwanted spaces in the lines(or just in case there are):

numbers = [n.strip() for n in numbers] # takes out unwanted spaces on the ends

and if you find there are commas or something after the numbers, you can do this:

numbers = [n[:-1] for n in numbers] # slices off the last character of each line/list item

for number in numbers:
    #do whatever you want here

EDIT:

Alternatively you could use a regular expression, and commas and spaces won't matter:

import re

n = ['1993-06-11 5570',
     '930611-5570',
     '930611 5570',
     '93 05115570',
     '1993 05 11 55 70',
     '1993 05 11 5570']

regex = '([0-9]+(?:[- ]?[0-9]+)*)'
match_nums = [re.search(regex, num) for num in n]
results = [i.groups() for i in match_nums]
for i in results:
    print i

('1993-06-11 5570',)
('930611-5570',)
('930611 5570',)
('93 05115570',)
('1993 05 11 55 70',)
('1993 05 11 5570',)

for info on regular expressions, see here

Using with for file operations is suggested. If it is Python 2.4 or something, you have to import with statement. Simplest solution for your problem with numbers that I could think of is:

from __future__ import with_statement
file_with_ssn = "/absolute/path/to/the/file"

try:
    with open(file_with_ssn) as ssn_file:
        for ssn in ssn_file:
            ssn = filter(str.isdigit, ssn)
            # This removes anything other than a number including -.
            # Since SSN cannot be in negative, isdigit is fine here
            numberprogram(ssn)
except EnvironmentError:
    print "IOError or OSError or WindowsError happened"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM