[英]Extract text ignoring a pattern using regex?
如何提取所需模式<.>
旁邊的文本,這是一個示例:
string = 'this is good < U+0097 > never end . < U+0093 > gift,<U+0094 > said . < U+0093 > test . < U+0093 > time ,
with,<U+0094 > said boys . gave answer , Naresh Hembrom ,
sitting crosslegged charpoy outside home , .'
我試過類似的東西,但它沒有給我所需的輸出。
import re
re.sub(r'[^a-zA-Z0-9]+', ' ', string)
所需輸出:
string = 'this is good never end . gift, said. test. time, with, said boys. gave answer, Naresh Hembrom, sitting crosslegged charpoy outside home, .'
這是我解決它的方法。
import re
string = 'this is good < U+0097 > never end . < U+0093 > gift,<U+0094 > said . < U+0093 > test . < U+0093 > time , with,<U+0094 > said boys . gave answer , Naresh Hembrom , sitting crosslegged charpoy outside home , .'
regString = re.sub(r'<(.*?)>','',string)
print (regString)
唯一需要注意的是,您不會有一致的間距。 你可以弄亂這個正則表達式並添加一些與此類似的東西來實現你所需要的。
隨意評論您需要什么,我可以提供幫助。
re.sub
返回一個字符串。 嘗試:
new_str = re.sub(r'<[^>]*>', '', string)
print(new_str)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.