Python Regular Expression to match space but not newline

Question

I have a string like so:

'\n479 Appendix I\n1114\nAppendix I 481\n'

and want to use a regular expression to find and return

['479 Appendix I', 'Appendix I 481']

I first tried this expression:

pattern = r'''
(?: \d+ \s)? Appendix \s+ \w+ (?: \s \d+)?
'''

regex = re.compile(pattern, re.VERBOSE)

regex.findall(s)

But this returns

['479 Appendix I\n1114', 'Appendix I 481']

because \\s also matches \\n . Following one of the answers in this post Python regex match space only , I tried the following:

pattern = r'''
(?: \d+ [^ \S\t\n])? Appendix \s+ \w+ (?: [^ \S\t\n] \d+)?
'''

regex = re.compile(pattern, re.VERBOSE)

regex.findall(s)

which however didn't return the desired result, giving:

['Appendix I', 'Appendix I']

What expression would work in this case?

Answer 1

import re

s = '\n479 Appendix I\n1114\nAppendix I 481\n'

for g in re.findall(r'^.*[^\d\n].*$', s, flags=re.M):
    print(g)

Prints:

479 Appendix I
Appendix I 481

This regex will match all lines that contain at least one character different than digit or newline. Explanation of this regex here .

Answer 2

This regex is a bit more robust than the one in the other answer because it explicitly anchors at "Appendix":

pattern = '(?:\d*[\t ]+)?Appendix\s+\w+(?:[\t ]+\d*)?'
re.findall(pattern, s)
#['479 Appendix I', 'Appendix I 481']