Python lxml xpath 无法获取文本

Question

I want to get the symbol and company from this url "https://www.set.or.th/set/commonslookup.do?language=en&country=US&prefix=A" However, it returns nothing and the response is [200]我想从此 url "https://www.set.or.th/set/commonslookup.do?language=en&country=US&prefix=A"获取符号和公司但是，它什么也不返回，响应是 [200]

print "hello from python 2"
from lxml import html
import requests
import csv
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get('https://www.set.or.th/set/commonslookup.do?language=en&country=US&prefix=A', headers=headers)
tree = html.fromstring(page.content)
tree1 = tree.xpath('//td/text()')
tree2 = tree.xpath('//td/a/text()')
print tree1
print tree2

How can I get the text for all symbol and company?如何获取所有符号和公司的文本？

Answer 1

The entire page contents is loaded via jquery.整个页面内容通过 jquery 加载。 If you look at the content in your response, you will see there is very little other than a wrapper around a javascript call that that dynamically loads the page content.如果您查看响应中的内容，您会发现除了动态加载页面内容的 javascript 调用的包装器外，几乎没有其他东西。

page.content
# returns:
b'<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection"
content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=
edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?SWUDNSAI=3
0&xinfo=0-25937873-0%200NNN%20RT%281572429155601%20933%29%20q%280%20-1%20-1%200%29%20r%280%20-1%29%20B12%284%2c315%2c0%2
9%20U5&incident_id=476000980067714022-125254320263005728&edet=12&cinfo=04000000&rpinfo=0" frameborder=0 width="100%" hei
ght="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 476000980067714022-12525432
0263005728</iframe></body></html>'

Unfortunately, this means you will need to use a library that supports content loaded this way.不幸的是，这意味着您将需要使用支持以这种方式加载的内容的库。 Either Selenium with PhantomJS or you can try to use requests_html . Selenium与PhantomJS或者您可以尝试使用requests_html 。

Python lxml xpath 无法获取文本

问题描述

1 个解决方案

解决方案1
0 2019-10-30 09:53:13

Python lxml xpath 无法获取文本

问题描述

1 个解决方案

解决方案1 0 2019-10-30 09:53:13

解决方案1
0 2019-10-30 09:53:13