[英]How to fix this type error thrown by regular expression in python?
我正在尝试为Python收集请求库的所有内部链接,并过滤掉所有外部链接。
我正在使用正则表达式执行相同的操作。 但是它引发了我无法解决的此类型错误。
我的代码:
import requests
from bs4 import BeautifulSoup
import re
r = requests.get('https://2.python-requests.org/en/master/')
content = BeautifulSoup(r.text)
[i['href'] for i in content.find_all('a') if not re.match("http", i)]
错误:
TypeError Traceback (most recent call last)
<ipython-input-10-b7d82067fe9c> in <module>
----> 1 [i['href'] for i in content.find_all('a') if not re.match("http", i)]
<ipython-input-10-b7d82067fe9c> in <listcomp>(.0)
----> 1 [i['href'] for i in content.find_all('a') if not re.match("http", i)]
~\Anaconda3\lib\re.py in match(pattern, string, flags)
171 """Try to apply the pattern at the start of the string, returning
172 a Match object, or None if no match was found."""
--> 173 return _compile(pattern, flags).match(string)
174
175 def fullmatch(pattern, string, flags=0):
TypeError: expected string or bytes-like object
您正在向它传递BeautifulSoup节点对象,而不是字符串。 尝试这个:
[i['href'] for i in content.find_all('a') if not re.match("http", i['href'])]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.