简体   繁体   中英

How to remove multiple quotes and lines within double quotes from a .txt file in python code?

I have a txt file with several strings and some of them are enclosed in double (or triple) quotes and would like to remove what is inside the quotation marks and remain only the quotation marks. Example:

""" aaaa """

bbbbb
ccccc

"""
dddddd
"""

and should look like this:

""" """

bbbbb
ccccc

"""

"""

I have to do this in python. Does anyone have any idea of a module that does this?

You can try to use the following regex:

s = '''
""" aaaa """

bbbbb
ccccc

"""
dddddd
"""
'''

import re
print(re.sub(r'(\"{2,3}[\s\n]*).*?([\n\s]*\"{2,3})', r'\1\2', s, flags=re.MULTILINE))

this outputs:

"""  """

bbbbb
ccccc

"""

"""

EDIT: to match multiline inside the quotes regex should be updated. Here is the example:

s = '''
""" aaaa """

bbbbb
ccccc

"""
dddddd
bb
"""
'''

import re

print(re.sub(r'(\"{2,3}[\s\n]*)(?:.*?[\s\n]*)*([\n\s]*\"{2,3})', r'\1\2', s, flags=re.MULTILINE))

gives output:

""" """

bbbbb
ccccc

"""
"""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM