抓取数据python lxml

Question

I'm trying to retrieving a specific string by scraping. 我正在尝试通过抓取来检索特定的字符串。 However it seem to return nothing. 但是它似乎什么也没返回。 i'm using python and lxml, but not seem to return the string inside the a tag. 我正在使用python和lxml，但似乎未在a标签内返回字符串。

here is the html i'm trying to retrieve 这是我要检索的html

<fieldset>
    <legend align="center">
        <a href="/counterstrike/events/302-cs-go-champions-league">CS:GO Champions League</a>
    </legend>
</fieldset>

Here is what i've tried 这是我尝试过的

def get_league(self):
    request = requests.get(self.url)
    tree = html.fromstring(request.content)
    league = tree.xpath("//legend[@class='center']//a")
    return league

Answer 1

Use xpath to select the text explicitly 使用xpath明确选择文本

//legend[@align='center']/a/text()

This plugin for chrome helps a lot when writing lxml queries Xpath Helper chrome的此插件在编写lxml查询时很有帮助Xpath Helper

Answer 2

Try this, it's not lxml but you can use it for scraping purposes. 试试看，它不是lxml，但是您可以将其用于抓取目的。 Firstly I'm going to define my own-made function, it'll be easier to scrape then 首先，我要定义自己的函数，然后抓取会更容易

def getBetweenHTML(strSource, strStart,strEnd):
    start = strSource.find(strStart) + len(strStart)
    end = strSource.find(strEnd,start)
    return strSource[start:end]

Afterwards, I'm going to do this: 然后，我将执行此操作：

def get_league(self):
    import urllib2
    url = urllib2.urlopen(self.url).read()
    getBetweenHTML(url, '<a href="/counterstrike/events/302-cs-go-champions-league">',"</a>")

This worked for me, it's just an alternative. 这对我有用，这只是一种选择。 If it's not what you're looking for, tell me and I'll re-write it for lxml. 如果不是您要的内容，请告诉我，我将为lxml重新编写。

抓取数据python lxml

问题描述

2 个解决方案

解决方案1
0 已采纳 2015-03-10 11:26:58

解决方案2
-1 2015-03-10 11:21:53

抓取数据python lxml

问题描述

2 个解决方案

解决方案1 0 已采纳 2015-03-10 11:26:58

解决方案2 -1 2015-03-10 11:21:53

解决方案1
0 已采纳 2015-03-10 11:26:58

解决方案2
-1 2015-03-10 11:21:53