简体   繁体   中英

Match multiple lines until a string contains

I have an csv file which contains the following information and I need the regular expression matching with the string as 'B08-1506' starting point until the next pattern matching with the above string. And I want to append the three lines to be considered as a single line

B08-1506,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B08-1606,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0680,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0681,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,

Output should be like this,

B08-1506,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh
B08-1606,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh
B09-0680,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh
B09-0681,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh

Like Nisarg said it is best to fix the source csv format. But incase you are not able to the below snippet might help.

Demo: ( Without Regex )

s = """B08-1506,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B08-1606,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0680,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0681,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,"""

res = []
for i in s.split("\n"):
    if i.startswith("B0"):    #Check if line starts with "B0"
        res.append(i)
    else:                      #else concat to the previous element in res. 
        res[-1] = res[-1]+i

res = [filter(None, i.split(",")) for i in res]    #Filter to remove all empty elements
for i in res:
    print(", ".join(i))

Output:

B08-1506, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh
B08-1606, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh
B09-0680, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh
B09-0681, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM