简体   繁体   中英

python regex - characters between certain characters

Edit: I should add, that the string in the test is supposed to contain every char there possible is (ie * + $ § € / etc.). So i thought of regexp should help best.

i am using regex to find all characters between certain characters([" and "]. My example goes like this:

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

The supposed output should be like this:

['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

My code including the regex looks like this:

import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)

And my outcome is the following:

['this is a text and its supposed to contain every possible char."]', 'another one after a newline."]', 'and another one even with']

so there are two problems:

1) its not removing "] at the end of a text as i want it to do with (?="\\])

2) its not capturing the third text in brackets, guess because of the newlines. But so far i wasnt able to capture those when i try .*\\n it gives me back an empty string.

I am thankful for any help or hints with this issue. Thank you in advance.

Btw iam using python 3.6 on anaconda-spyder and the newest regex (2018).

EDIT 2: One Alteration to the test:

test = """[
    "this is a text and its supposed to contain every possible char."
    ], 
    [
    "another one after a newline."
    ], 

    [
    "and another one even with
    newlines

    in it."
    ]"""

Once again i have trouble to remove the newlines from it, guess the whitespaces could be removed with \\s, so an regexp like this could solve it, i thought.

my_list = re.findall(r'(?<=\[\S\s\")[\w\W]*(?=\"\S\s\])', test)
print (my_list)

But that returns only an empty list. How to get the supposed output above from that input?

You can try this mate.

(?<=\[\")[\w\s.]+(?=\"\])

Demo

What you missed in your regex .* will not match newline.

PS I am not matching special characters. if you want it can be achieved very easily.

This one matches special characters too

(?<=\\[\\")[\\w\\W]+?(?=\\"\\])

Demo 2

In case you might also accept not regex solution, you can try

result = []
for l in eval(' '.join(test.split())):
    result.extend(l)

print(result)
#  ['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

So here's what I came up:

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

for i in test.replace('\n', '').replace('    ', ' ').split(','):
    print(i.lstrip(r' ["').rstrip(r'"]'))

Which results in the following being printed to the screen

this is a text and its supposed to contain every possible char.
another one after a newline.
and another one even with newlines in it.

If you want a list of those -exact- strings, we could modify it to-

newList = []
for i in test.replace('\n', '').replace('    ', ' ').split(','):
  newList.append(i.lstrip(r' ["').rstrip(r'"]'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM