如何将CSS选择器的输出传递给漂亮的汤？

Question

I want to scrape some webpages, I'm using a Chrome extension called "SelectorGadget". 我想抓取一些网页，我使用的是名为“ SelectorGadget”的Chrome扩展程序。 its a CSS selector. 它是一个CSS选择器。 Now for example for this URL: http://www.www2015.it/documents/proceedings/forms/proceedings.htm the CSS selector gives me this output for the list of papers: tr~ tr+ tr td+ td a Now, the problem is I cannot figure out that how can I pass this output to beautiful soup. 现在以该URL为例： http : //www.www2015.it/documents/proceedings/forms/proceedings.htm CSS选择器为我提供了以下文件列表的输出： tr〜tr + tr td + td a现在，问题所在我无法弄清楚如何将这个输出传递给漂亮的汤。 In the following lines, the .select() does not recognize these selectors! 在以下各行中，.select（）无法识别这些选择器！

import requests
page = requests.get("http://www.www2015.it/documents/proceedings/forms/proceedings.htm")
import bs4
soup = bs4.BeautifulSoup(page.content)
soup.select("tr~ tr+ tr td+ td a")

Answer 1

The problem is - BeautifulSoup has a very limited CSS selector syntax support . 问题是BeautifulSoup对CSS选择器语法的支持非常有限。 In your case, going sideways with ~ or + is not going to work as is. 在您的情况下，与~或+并排使用将无法按原样工作。

If you are looking to match the pdf links on this page, I would use the following selector: 如果您要匹配此页面上的pdf链接，则可以使用以下选择器：

soup.select("a[href$=pdf]")  # get the links where href ends with "pdf"

如何将CSS选择器的输出传递给漂亮的汤？

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-02-11 21:55:05

如何将CSS选择器的输出传递给漂亮的汤？

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-02-11 21:55:05

解决方案1
0 已采纳 2016-02-11 21:55:05