[英]How to apply string method on regular expression in Python
我的markdown文件有點破損:太長的鏈接和圖像都有換行符。 我想從中刪除換行符。
例:
從:
See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to
![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)
_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
至:
See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to
![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)
_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
如您在此代碼段中所見,我設法用正確的模式匹配所有鏈接和圖像: https : //regex101.com/r/uL8pO4/2
但是,現在,在正則表達式捕獲的內容上使用像string.trim()
這樣的字符串方法的Python語法是什么?
就目前而言,我對此堅持:
fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])
編輯:我更新了示例,以更明確地說明我的問題。
謝謝您的回答
剝離功能類似於修剪功能。 由於您需要修剪新行,因此請使用strip('\\ n'),
fin.readline.strip('\n')
這也將起作用:
>>> s = """
... ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """
>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>>
通常,內置字符串函數會起作用,並且比弄清楚正則表達式更容易閱讀。 在這種情況下,strip刪除前導和尾隨空格,然后split返回換行符之間的項目列表,join將它們放回到單個字符串中。
好吧,我終於找到了要搜索的內容。 在下面的代碼段中,我可以使用正則表達式捕獲字符串,然后對每個字符串進行處理。
def remove_newlines(match):
return "".join(match.group().strip().split('\n'))
links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])
感謝您的回答,如果我的問題不夠明確,請對不起。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.