Need help getting the above words (ZYGOMA, ZOMA, ZYGMA) after the match nm(noun masculine) and nf(noun feminine) is found. I've tried different flags like multiline and dotall but still no luck getting the main words above. Any help will be greatly appreciated
import re
def main():
mytext = open("m.txt")
mypattern = re.compile('n. (m.|f.)')
for line in mytext:
match = re.search(mypattern, line)
if match:
print(match.group())
if __name__ == "__main__":
main()
The text i'm using as a sample is:
ZYGOMA
nm T. d'Anatomie . Os de la pommette de la joue.
ZOMA
nm T. d'Anatomie . Os de la pommette de la joue.
ZYGMA
nm T. d'Anatomie . Os de la pommette de la joue.
How the main file i'll parse looks like this:
Implying the words that are searched for are capitalized:
import re
text = """
ZYGOMA
n. m. T. d'Anatomie . Os de la pommette de la joue.
ZOMA
n. m. T. d'Anatomie . Os de la pommette de la joue.
ZYGMA
n. m. T. d'Anatomie . Os de la pommette de la joue.
A B C
n. m. T. d'Anatomie . Os de la pommette de la joue.
"""
g = re.findall(r'([A-Z][A-Z ]*)\s+(?=n\. m|f)', text)
print(g)
Will print:
['ZYGOMA', 'ZOMA', 'ZYGMA', 'A B C']
For Unicode capitalized words the solutions is here: Python regex for unicode capitalized words
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.