使用 python 从文本中删除链接

Question

我需要从许多文本行中删除链接。 下面给出了一个示例： b'585947808772960257|wed apr 08 23:30:18 +0000 2015|gp 工作负载损害护理 - bma poll http://bbc.in/1chtbrv \\r\\n'我试过 python 代码 text = re.sub(r'^http://. [\\r\\n] ', '', text) 但它给出了一个错误 TypeError: cannot use a string pattern on a bytes-like object

我可以从字符串中删除二进制文本吗？

Answer 1

b前缀表示它是一个bytes对象，您需要知道encoding才能对其进行解码并将其转换为string对象。

a = b'585947808772960257|wed apr 08 23:30:18 +0000 2015|gp workload harming care - bma poll http://bbc.in/1chtbrv\r\n'
print(type(a))
>>> <class 'bytes'>

如果不带参数运行 decode ，它将使用utf-8 ：

decoded_a = a.decode()
print(decoded_a)
>>> 585947808772960257|wed apr 08 23:30:18 +0000 2015|gp workload harming care - bma poll http://bbc.in/1chtbrv  

print(type(decoded_a))
>>> <class 'str'>

Answer 2

您需要先使用 .decode() 方法解码二进制字符串：

binary_string = b'585947808772960257|wed apr 08 23:30:18 +0000 2015|gp workload harming care - bma poll http://bbc.in/1chtbrv\r\n'
# decode the binary string
string = binary_string.decode("utf-8")
# find the url pattern
repstring = re.search('.*(http:\/\/.*)\r\n', string).group(1)
# replace the url pattern
text = re.sub(repstring, '', binary_string.decode("utf-8") )

如果需要再次得到二进制格式的结果，则需要再次编码：

text_binary = str.encode(text)

使用 python 从文本中删除链接

问题描述

2 个解决方案

解决方案1
0 2020-03-07 16:57:52

解决方案2
0 已采纳 2020-03-07 17:20:32

使用 python 从文本中删除链接

问题描述

2 个解决方案

解决方案1 0 2020-03-07 16:57:52

解决方案2 0 已采纳 2020-03-07 17:20:32

解决方案1
0 2020-03-07 16:57:52

解决方案2
0 已采纳 2020-03-07 17:20:32