简体   繁体   English

Python lxml xpath 无法获取文本

[英]Python lxml xpath unable to get text

I want to get the symbol and company from this url "https://www.set.or.th/set/commonslookup.do?language=en&country=US&prefix=A" However, it returns nothing and the response is [200]我想从此 url "https://www.set.or.th/set/commonslookup.do?language=en&country=US&prefix=A"获取符号和公司但是,它什么也不返回,响应是 [200]

print "hello from python 2"
from lxml import html
import requests
import csv
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get('https://www.set.or.th/set/commonslookup.do?language=en&country=US&prefix=A', headers=headers)
tree = html.fromstring(page.content)
tree1 = tree.xpath('//td/text()')
tree2 = tree.xpath('//td/a/text()')
print tree1
print tree2

How can I get the text for all symbol and company?如何获取所有符号和公司的文本?

The entire page contents is loaded via jquery.整个页面内容通过 jquery 加载。 If you look at the content in your response, you will see there is very little other than a wrapper around a javascript call that that dynamically loads the page content.如果您查看响应中的内容,您会发现除了动态加载页面内容的 javascript 调用的包装器外,几乎没有其他东西。

page.content
# returns:
b'<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection"
content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=
edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?SWUDNSAI=3
0&xinfo=0-25937873-0%200NNN%20RT%281572429155601%20933%29%20q%280%20-1%20-1%200%29%20r%280%20-1%29%20B12%284%2c315%2c0%2
9%20U5&incident_id=476000980067714022-125254320263005728&edet=12&cinfo=04000000&rpinfo=0" frameborder=0 width="100%" hei
ght="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 476000980067714022-12525432
0263005728</iframe></body></html>'

Unfortunately, this means you will need to use a library that supports content loaded this way.不幸的是,这意味着您将需要使用支持以这种方式加载的内容的库。 Either Selenium with PhantomJS or you can try to use requests_html . SeleniumPhantomJS或者您可以尝试使用requests_html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM