Tag[lxml.html] Recent Newest Questions

lxml: Xpath works in Chrome but not in lxml

I'm trying to scrape information from this episode wiki page on Fandom, specifically the episode title in Japanese, 謀略Ⅳ：ドライバーを奪還せよ！: Conspiracy IV ...

How to get text from html attributes

I tried to parse a page to get some element as text, but I cant find how to get text from select For exmaple, html below has data-initial-rating="4" ...

BeautifulSoup Scraping Results not showing

I am playing around with BeautifulSoup to scrape data from websites. So I decided to scrape empireonline's website for 100 greatest movies of all time ...

Cleanup HTML using lxml and XPath in Python

I'm learning python and lxml toolkit. I need process multiple .htm files in the local directory (recursively) and remove unwanted tags include its con ...

Python, lxml.html: Need a generic funtion to return innerhtml of any element

I found a nice function here by Siva Kannan but its not working in my case. I'm using lxml.html to get the data from the page and not etree. When I us ...

Best XPath practices for extracting data from a field that varies in format

I was using Python 3.8, XPath and Scrapy where things just seemed to work. I took my XPath expressions for granted. Now I'm must using Python 3.8, XP ...

How to get data from a webpage using Python

Last year I had written a python script, to store data of COVID-19 cases (active, cured and deaths) from the website. The script was running fine init ...

Why does python requests.get() retrieve different image src compared to browsing the site

As the title suggest: calling the requests.get() method gives me a different image src link as opposed to when browsing the site manually. I'm trying ...

Webscraping Scopus with lxml.html

I'm trying to webscrape Scopus with lxml.html (ultimately to create a list of document titles), but it seems no data is being stored from the page.con ...

How to use lxml for web scraping?

I want to write a python script that fetches my current reputation on stack overflow --https://stackoverflow.com/users/14483205/raunanza?tab=profile ...

Type hints for lxml?

New to Python and come from a statically typed language background. I want type hints for https://lxml.de just for ease of development (mypy flagging ...

How to get text from HTML element by using lxml.html

I've been trying to get a full text hosted inside a <div> element from the web page https://www.list-org.com/company/11665809. The element shou ...

LXML/Python - Looping over a list of lxml.etree._Element

I'm trying to loop over a list of 5 lxml._Element. Here is an extract of the part of the html I'm interested in: I've save the extract under an ht ...

Lxml is returning an empty list

I am working with lxml to try to get the top 10 hits currently on spotify(https://spotifycharts.com/regional). When I run the program, it returns an e ...

Issue with Python Selenium using `find_element_by_xpath(xpath)`

I am using Python Selenium to try and scrape or obtain data because lxml is so poorly documented with parsing HTML and obtaining data using xpath, and ...

Css selector get text outside tag

I have the following HTML: I want to get "26EU" via css selector using lxml i had already tried this but it returned all of text in the tag ...

Scraping a nested and unstructured table in python (lxml)

The website I'm scraping (using lxml ) is working just fine with everything except a table, in which all the tr's , td's and heading th's are nested & ...

Getting empty list while using xpath with html.fromstring

I am trying to extract text from a webpage using below code. It is working fine for other websites but here i am getting empty list ...

How to scrape the html page that provides more information while scrolling down by using python lxml

I am scraping the text from https://www.basketball-reference.com/players/p/parsoch01.html. But I cannot scrape the contents that is located below the ...

Using lxml.html with broken html entities?

I need to work with a page, which has an unfortunate mix of correct and incorrect HTML entities; for instance: This, in Firefox 67, does get interp ...