簡體   English   中英

如何在Python中的正則表達式上應用字符串方法

[英]How to apply string method on regular expression in Python

我的markdown文件有點破損:太長的鏈接和圖像都有換行符。 我想從中刪除換行符。

例:

從:

See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

至:

See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

如您在此代碼段中所見,我設法用正確的模式匹配所有鏈接和圖像: https : //regex101.com/r/uL8pO4/2

但是,現在,在正則表達式捕獲的內容上使用像string.trim()這樣的字符串方法的Python語法是什么?

就目前而言,我對此堅持:

fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])

編輯:我更新了示例,以更明確地說明我的問題。

謝謝您的回答

剝離功能類似於修剪功能。 由於您需要修剪新行,因此請使用strip('\\ n'),

fin.readline.strip('\n')

這也將起作用:

>>> s = """
...    ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """

>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>> 

通常,內置字符串函數會起作用,並且比弄清楚正則表達式更容易閱讀。 在這種情況下,strip刪除前導和尾隨空格,然后split返回換行符之間的項目列表,join將它們放回到單個字符串中。

好吧,我終於找到了要搜索的內容。 在下面的代碼段中,我可以使用正則表達式捕獲字符串,然后對每個字符串進行處理。

def remove_newlines(match):
    return "".join(match.group().strip().split('\n'))

links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])

感謝您的回答,如果我的問題不夠明確,請對不起。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM