简体   繁体   English

我如何使用正则表达式获取两个字符内的字符串并删除该字符串内的某些字符

[英]how do i use regex to get a string inside two character and remove certain characters inside that string

I have a long string that I want to filter using regex我有一个很长的字符串,我想使用正则表达式进行过滤

<@961483653468439706> Text to remove, this text is useless, that's why i want it gone!
i want this: `keep the letters and spaces`

I want to keep the text that in-between the ` characters我想保留 ` 字符之间的文本

only issue is that in-between every character in the part of the string I want there is an invisible character.唯一的问题是在我想要的字符串部分中的每个字符之间都有一个不可见的字符。 you can see the invisible characters in regex101: https://regex101.com/r/rAYrMT/1你可以在regex101中看到不可见的字符: https ://regex101.com/r/rAYrMT/1

`([\'^\w]*)`

So in short: keep everything between ` except for the invisible characters info on which can be found here: https://apps.timwhitlock.info/unicode/inspect?s=%EF%BB%BF简而言之:将所有内容保留在` 之间,除了可以在此处找到的不可见字符信息: https : //apps.timwhitlock.info/unicode/inspect?s=%EF%BB%BF

You can filter the non printable characters out:您可以过滤掉不可打印的字符:

import re 
from string import printable

# your invisibles are in the string...

s='''<@961483653468439706> Text to remove, this text is useless, that's why i want it gone!
Type `keep the letters and spaces` and `this too`'''

for m in re.findall(r'`([^`]*)`', s):
    print(repr(m))
    print(''.join([c for c in m if c in printable]))
    print()

Prints:印刷:

'k\ufeffe\ufeffe\ufeffp\ufeff \ufefft\ufeffh\ufeffe\ufeff \ufeffl\ufeffe\ufefft\ufefft\ufeffe\ufeffr\ufeffs a\ufeffn\ufeffd s\ufeffp\ufeffa\ufeffc\ufeffe\ufeffs'
keep the letters and spaces

'this too'
this too

You don't need to use regex for this:您不需要为此使用正则表达式:

text = "<@961483653468439706> Text to remove, this text is useless, that's " \
       "why i want it gone!Type `keep the letters and spaces`"

# put your invisible character between the first quotation marks here. obviously, they
# don't show up in this post.
filtered = text.replace('', '')
# because the passage you want is always between ``, you can split it and know that every
# second item in the list that split returns must be what you are looking for. 
passage = filtered.split('`')[::2]

print(passage)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM