How to pass the output of CSS Selector to beautiful soup?

Question

I want to scrape some webpages, I'm using a Chrome extension called "SelectorGadget". its a CSS selector. Now for example for this URL: http://www.www2015.it/documents/proceedings/forms/proceedings.htm the CSS selector gives me this output for the list of papers: tr~ tr+ tr td+ td a Now, the problem is I cannot figure out that how can I pass this output to beautiful soup. In the following lines, the .select() does not recognize these selectors!

import requests
page = requests.get("http://www.www2015.it/documents/proceedings/forms/proceedings.htm")
import bs4
soup = bs4.BeautifulSoup(page.content)
soup.select("tr~ tr+ tr td+ td a")

Answer 1

The problem is - BeautifulSoup has a very limited CSS selector syntax support . In your case, going sideways with ~ or + is not going to work as is.

If you are looking to match the pdf links on this page, I would use the following selector:

soup.select("a[href$=pdf]")  # get the links where href ends with "pdf"

How to pass the output of CSS Selector to beautiful soup?

Question

1 answers

solution1
0 ACCPTED 2016-02-11 21:55:05

How to pass the output of CSS Selector to beautiful soup?

Question

1 answers

solution1 0 ACCPTED 2016-02-11 21:55:05

solution1
0 ACCPTED 2016-02-11 21:55:05