简体   繁体   中英

How to apply string method on regular expression in Python

I'm having a markdown file wich is a little bit broken: the links and images which are too long have line-breaks in it. I would like to remove line-breaks from them.

Example:

from:

See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

to:

See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

As you can see in this snippet, I managed to match the all links and images with the right pattern: https://regex101.com/r/uL8pO4/2

But now, what is the syntax in Python to use a string method like string.trim() on what I have captured with regular expression?

For the moment, I'm stuck with this:

fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])

Edit: I updated the example to be more explicit about my problem.

Thank you for your answer

strip would work similar to functionality of trim. As you would need to trim the new lines, use strip('\\n'),

fin.readline.strip('\n')

This will work also:

>>> s = """
...    ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """

>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>> 

Often times built-in string functions will do, and are easier to read than figuring out regexes. In this case strip removes leading and trailing space, then split returns a list of items between newlines, and join puts them back together in a single string.

Alright, I finally found what I was searching. With the snippet below, I could capture a string with a regex and then apply the treatment on each of them.

def remove_newlines(match):
    return "".join(match.group().strip().split('\n'))

links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])

Thank you for your answers and sorry if my question wasn't explicit enough.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM