如何从 python 中的 1 个字符串中删除特定案例元素

Question

here i have a string with a website html data it's stored in urldata在这里，我有一个带有网站 html 数据的字符串，它存储在 urldata 中

urldata = BeautifulSoup(urlopen(urllib.request.Request(url, headers=headers), timeout=3).read(),features="html.parser")```

when i print urldata it's showing the html data from the specific page so here i need to remove the https and http links当我打印urldata它显示来自特定页面的 html 数据所以这里我需要删除 https 和 http 链接

so i can fillter the http or https links by this way所以我可以通过这种方式填写 http 或 https 链接

web_page = str(urldata)
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA- F]))+', web_page)
print(urls)

so here i'm thinking to remove the http&https links from "urldata"所以在这里我想从“urldata”中删除http和https链接

I have the url list already in that url variable (type "list")我有 url 列表已经在 url 变量（类型“列表”）

so is there any way to compare the list "urls" with "web_page" string那么有什么方法可以将列表“urls”与“web_page”字符串进行比较

and remove the urls from web_page string并从 web_page 字符串中删除 url

Answer 1

You can use re.sub() to substitute each url with an empty string:您可以使用re.sub()将每个 url 替换为空字符串：

web_page = str(urldata)
web_page = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA- F]))+', '', web_page)
print(web_page)

UPDATE:更新：

web_page = str(urldata)
for url in urls:
    web_page = web_page.replace(url, '')
print(web_page)

如何从 python 中的 1 个字符串中删除特定案例元素

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-01 19:16:51

如何从 python 中的 1 个字符串中删除特定案例元素

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-01 19:16:51

解决方案1
1 已采纳 2020-07-01 19:16:51