简体   繁体   中英

Find all items in a list that match a specific format

I am trying to find everything in a list that has an format like "######-##"

I thought I had the right idea in my following code, but it isn't printing anything. Some values in my list have that format, and I would think it should print it. Could you tell me what's wrong?

for line in list_nums:
    if (line[-1:].isdigit()):
        if (line[-2:-1].isdigit()):
            if (line[-6:-5].isdigit()):
                if ("-" in line[-3:-2]):
                    print(list_nums)

The values in my list consist of formats like 123456-56 and 123456-98-98, which is why what I did above. It is pulled from an excel sheet.

This is my updated code.

import xlrd
from re import compile, match

file_location = "R:/emily/emilylistnum.xlsx"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)
regexp = compile(r'^\d{d}-\d{2}$')
list_nums = ""


for row in range(sheet.nrows):
    cell = sheet.cell_value(row,0)
    if regexp.match(cell):
        list_nums += cell + "\n"
        print(list_nums)

my excel sheet consists of: 581094-001 581095-001 581096-001 581097-01 5586987-007 SMX53-5567-53BP 552392-01-01 552392-02 552392-03-01 552392-10-01 552392-10-01 580062 580063 580065 580065 580066 543921-01 556664-55

(in each cell down in one column)

If you need to only match the pattern ######-## (where # is a digit):

>>> from re import compile, match
>>> regexp = compile(r'^\d{6}-\d{2}$')
>>> print([line for line in list_nums if regexp.match(line)])
['132456-78']

Explanations

You compile the pattern into a regexp object to be more efficient when matching. The regexp is ^\\d{6}-\\d{2}$ where:

^  # start of the line
\d{6}-\d{2}  # 6 digits, one dot then 2 digits
$  # end of the line

In the regexp, \\d means digit (an integer from 0 to 9) and {6} means 6 times. So \\d{3} means 3 digits. You should read the Python documentation about regexp s.


Full code

An example based on your comment:

file_location = 'file.xlsx'
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)
regexp = compile(r'^\d{6}-\d{2}$')

list_nums = ''
for row in range(sheet.nrows):
    cell = sheet.cell_value(row, 0)
    if regexp.match(cell):
        list_nums += cell + "\n"

Your code seems to be doing the right thing, with the exception that you want it to print the value of line instead of the value of list_nums .

Another approach to the task at hand, would be to use regular expressions, which are ideal for pattern recognition.

EDIT: CODE NOW ASSUMES list_nums TO BE A SINGLE STRING

import re

rx = re.compile('\d{6}-\d{2}\Z')
for line in list_nums.split('\n'):
  if rx.match(line):
    print line

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM