简体   繁体   中英

Python - Get html table element with lxml.html regex

I am trying to get the following element of the following website: https://www.investing.com/economic-calendar/

在此处输入图片说明

I am using python request and lxml.html:

import requests
import lxml.html

payload= {
 'country[]': [25,32],
 'limit_from': 0,
 'submitFilters': 1,
 'timeFilter': 'timeRemain',
 'currentTab': 'today',
 'timeZone': 55}
headers={'User-Agent': 'Mozilla/5.0','X-Requested-With': 'XMLHttpRequest'}

r=requests.post("https://www.investing.com/economic-calendar/",
                               data=payload, headers=headers) 
html = lxml.html.fromstring(r.text)
results = html.xpath("//table[@id='economicCalendarData']//tr")

Lets consider here that the 3rd item in the list results is the one of interest. Those elements from the column "actual" have in common the suffix "actual" in the class attribute of the td. But the int before that and the font style varies. So I would like to use a regex in my xpath expression only locating the suffix "actual".

I have been trying
results[3].find(".//td[contains(@class,'actual')]")

and

results[3].find(".//td[substring(@class, string-length(@class)-6)='actual']")

(both from other SO questions) but both return SyntaxError: invalid predicate .

Can anyone help me find the correct xpath regexp expression to locate that td ?

I'm from Upwork. I guess this is what you want

results[3].xpath("//td[contains(@class,'actual')]")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM