Python：lxml xpath在开始时会提供不需要的数据

Question

I want to get the part started from Straits Times Index (STI) (STI.SI) to the end of it. 我想从Straits Times Index (STI) (STI.SI)到结尾。 It is a long list. 这是一个长长的清单。

<option value="nyse_mkt" class="{access_allowed : true}">NYSE Mkts</option>
<option value="world" class="{access_allowed : true}">World</option></select><select class="validate-selection" id="counter_sgx" name="counter"><option value="">-- Select Counter --</option>
<option value="STI.SI">Straits Times Index (STI) (STI.SI)</option>
<option value="ADLN.SI">ADLN (ADLN.SI)</option>
<option value="SGXCN2.SI">CN ACCESS INDEX (TR) (SGXCN2.SI)</option>
<option value="SGXCN7.SI">CN ACCESS STB (10%) INDEX (SGXCN7.SI)</option>
<option value="SGXCN6.SI">CN ACCESS STB (5%) INDEX (SGXCN6.SI)</option>
<option value="SGXCN15.SI">FNGUIDE CN ACC (1X) TR IDX (SGXCN15.SI)</option>
<option value="SGXCN13.SI">FNGUIDE CN ACC INV 1X TR KRW IDX (SGXCN13.SI)</option>
<option value="SGXCN14.SI">FNGUIDE CN ACC LEV 2X TR IDX (SGXCN14.SI)</option>
<option value="FSTAS.SI">FTSE ST All-Share Index (FSTAS.SI)</option>

However, I only manage to get some unused data at the beginning. 但是，我一开始只能设法获取一些未使用的数据。

['SGX',
 'Bursa',
 'HKEx',
 'SET',
 'IDX',
 'ASX',
 'NYSE',
 'NASDAQ',
 'NYSE Mkts',
 'World',
 '-- Select Counter --',
 'Straits Times Index (STI) (STI.SI)',
 'ADLN (ADLN.SI)',
 'CN ACCESS INDEX (TR) (SGXCN2.SI)',
 'CN ACCESS STB (10%) INDEX (SGXCN7.SI)',
 'CN ACCESS STB (5%) INDEX (SGXCN6.SI)',
 'FNGUIDE CN ACC (1X) TR IDX (SGXCN15.SI)',
 'FNGUIDE CN ACC INV 1X TR KRW IDX (SGXCN13.SI)',
 'FNGUIDE CN ACC LEV 2X TR IDX (SGXCN14.SI)',
 'FTSE ST All-Share Index (FSTAS.SI)']

My code is: 我的代码是：

from lxml import html
import requests

page = requests.get('http://www.shareinvestor.com/fundamental/factsheet.html?counter=STI.SI')
tree = html.fromstring(page.content)
tree.xpath('//option[@value]/text()')

example output I need is 我需要的示例输出是

Straits Times Index (STI) (STI.SI)
ADLN (ADLN.SI)
CN ACCESS INDEX (TR) (SGXCN2.SI)
...
FTSE ST All-Share Index (FSTAS.SI)

Answer 1

You want to build a XPath that does the following: 您想要构建一个执行以下操作的XPath：

Find the element with a given value ( STI.SI ), 查找具有给定值（ STI.SI ）的元素，
Give me all its option siblings' texts. 给我所有它的option兄弟姐妹的文字。

For the first part, in your code you only get the options which have a value (almost all of them), you have to modify it as this: option[@value="STI.SI"] . 对于第一部分，在您的代码中仅获得具有值的选项（几乎所有选项），您必须按以下方式对其进行修改： option[@value="STI.SI"] 。

For the second part, you have to follow all option siblings: following-sibling::option . 对于第二部分，您必须遵循所有选项兄弟姐妹： following-sibling::option 。

Wrapping all together: 包装在一起：

tree.xpath('//option[@value="STI.SI"]/following-sibling::option/text()')

will give you all the siblings after the one you specified. 将在您指定的兄弟姐妹之后给您所有兄弟姐妹。 But since you wanted also that included, you should do this: 但是由于您还希望包括在内，因此应该执行以下操作：

tree.xpath('//option[@value=""]/following-sibling::option/text()')

because the option before the one with value STI.SI has a blank value. 因为值STI.SI之前的选项具有空白值。 Careful with this, because I haven't tested what would happen with multiple blank values in the select . 请小心，因为我还没有测试select中的多个空白值会发生什么。

Python：lxml xpath在开始时会提供不需要的数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-01-18 10:49:19

Python：lxml xpath在开始时会提供不需要的数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-01-18 10:49:19

解决方案1
0 已采纳 2017-01-18 10:49:19