简体   繁体   中英

How do I match a word if it starts with a nonalphanumeric character in Python?

Suppose I have a text file where each line contains either '1' or '-1.' How do I search through the file to check if the file contains at least one '1'?

Initially, I had the following.

if re.search(r'\b1', f.read()): return true
else: return false

However, this does not work because '-' is not considered an alphanumeric string and returns true if the file does not contain a single '1.' What is the best way to determine if the file contains '1'?

Using the re.MULTILINE flag, ^ will match start of lines (instead of only start of subject):

re.search(re.compile('^1', re.MULTILINE), f.read())

This will match if any line starts with 1 .

See http://docs.python.org/library/re.html#regular-expression-syntax


This alternative solution avoid reading the file entirely:

has_1 = any(line == "1" for line in f)

any('1' in line for line in file) is one way without reading the entire file to memory.

A convoluted but possibly efficient way

fmap = mmap.mmap(open('file').fileno(), 0)
'1' in fmap

You can also run a re against the mmap'd file.

re.search('^1', fmap, re.M)


f = open("textfile.txt", "rb")
lines = f.readlines()
new_lines = [line.replace("-1", "") for line in lines]
for line in new_lines:
    if "1" in line:
        print "Damn right!"
        break

def thingy(contents):
    return any(line.strip() == "1" for line in contents.splitlines())

thingy("1\n-1\n-1") # True
thingy("-1\n-1\n-1") # False

Alternatively:

def thingy(contents):
    for line in contents.splitlines():
        if line.strip() == "1":
            return True

    return False


Simply with list comprehension :

>>> if not None in [ re.search( r"1", line ) for line in f.readlines() ] :
        pass # <your code here>

If the '1' or '-1' always occurs at the start of the line, then you could change the regex to:

 ^1

If they always occur in the middle/end of the line, then use:

[^-]1

If they sometimes occur at the start and sometimes in the middle/end, then you might try something like:

^1|[^-]1

I haven't tested these. The last one, in particular, I'm not sure if the precedence is right.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM