简体   繁体   中英

Python Regex - Matching tokens in different lines of a file

In a file, I have the following lines:

[Line 1] My Name is Adam;
[Line 2] <Blank Line>
[Line 3] My Name 
[Line 4] is Adam Lee;
[Line 5] <Blank Line>
[Line 6] My
[Line 7] Name
[Line 8] is
[Line 9] Adam
[Line 10] Lee;

My tokens are: 'My' 'Name' 'Adam' and I know that they would end with ';'

Here is how I have written my code in Python:

#Read the input file
try:
    file_path = sys.argv[1]
    content = "".join(open(file_path))
    my_file = open(file_path).read()
except Exception as err:
    print("Exception caught while opening the file!")
    print(repr(err))
    exit()

# Find matches 
my_regex = r"^[ ]*My\s+Name.*Adam.*[;/]"
matches = re.findall(my_regex, my_file, flags=re.IGNORECASE + re.MULTILINE)

Observation: Only Line 1 is getting matched. My expectation is Line 3-4 and Line 6-10 also get matched since the tokens and the delimiter ticks the boxes. How can I modify my regex? Please help.

You might write the pattern using a negated character class matching any char except a semicolon:

^ *My\s+Name[^;]*Adam[^;]*;
  • ^ Start of string
  • * Match optional spaces
  • My\s+Name Match My Name with 1+ whitespace chars in between
  • [^;]*Adam[^;]* Match Adam between optional chars other than ;
  • ; Match the ; at the end of the string

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM