简体   繁体   中英

Regex capture 2 lines above regex match

Need help getting the above words (ZYGOMA, ZOMA, ZYGMA) after the match nm(noun masculine) and nf(noun feminine) is found. I've tried different flags like multiline and dotall but still no luck getting the main words above. Any help will be greatly appreciated

import re


def main():
    mytext = open("m.txt")
    mypattern = re.compile('n. (m.|f.)')
    for line in mytext:
        match = re.search(mypattern, line)
        if match:
            print(match.group())

if __name__ == "__main__":
    main()

The text i'm using as a sample is:

ZYGOMA

nm T. d'Anatomie . Os de la pommette de la joue.

ZOMA

nm T. d'Anatomie . Os de la pommette de la joue.

ZYGMA

nm T. d'Anatomie . Os de la pommette de la joue.

How the main file i'll parse looks like this:

我将如何解析主文件如下所示

Implying the words that are searched for are capitalized:

import re

text = """
    ZYGOMA

    n. m. T. d'Anatomie . Os de la pommette de la joue.

    ZOMA

    n. m. T. d'Anatomie . Os de la pommette de la joue.

    ZYGMA

    n. m. T. d'Anatomie . Os de la pommette de la joue.

    A B C

    n. m. T. d'Anatomie . Os de la pommette de la joue.
"""

g = re.findall(r'([A-Z][A-Z ]*)\s+(?=n\. m|f)', text)
print(g)

Will print:

['ZYGOMA', 'ZOMA', 'ZYGMA', 'A B C']

For Unicode capitalized words the solutions is here: Python regex for unicode capitalized words

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM