Python将字符串添加到具有多个项目的匹配列表

Question

我正在处理的代码是从具有2个字段，URL和标题的HTML页面中检索列表...

URL始终以/URL....开头/URL....我需要在从re.findall返回的每个返回值后面附加“ http://website.com ”。

到目前为止的代码是这样的：

bsoup=bs(html)
tag=soup.find('div',{'class':'item'})
reg=re.compile('<a href="(.+?)" rel=".+?" title="(.+?)"')
links=re.findall(reg,str(tag))
*(append "http://website.com" to the href"(.+?)" field)*
return links

Answer 1

尝试：

for link in tag.find_all('a'):
    link['href'] = 'http://website.com' + link['href']

然后使用以下输出方法之一：

应用更改后， return str(soup)将为您提供文档。

return tag.find_all('a')获取所有链接元素。

return [str(i) for i in tag.find_all('a')]将所有链接元素转换为字符串。

现在，当您已经有XML解析器工作时 ，请勿尝试使用正则表达式解析HTML 。

Python将字符串添加到具有多个项目的匹配列表

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-12-26 00:11:00

Python将字符串添加到具有多个项目的匹配列表

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-12-26 00:11:00

解决方案1
2 已采纳 2015-12-26 00:11:00