简体   繁体   English

通过 LXML 查找元素 XPATH - Python

[英]Find element by XPATH via LXML - Python

I am having some problems with scraping some web data using LXML.我在使用 LXML 抓取一些 web 数据时遇到了一些问题。 I want to scrape one thing from a website using BeautifulSoup so I decided I would go with LXML.我想从使用 BeautifulSoup 的网站上抓取一件事,所以我决定使用 LXML go。 I wrote some code and got the Discord Bot to access the website.我写了一些代码并让 Discord Bot 访问该网站。 Now the only thing left is to code finding those elemenents.现在唯一剩下的就是编写代码来查找这些元素。 Here is my code, help would be appreciated.这是我的代码,将不胜感激。

@tasks.loop(seconds = 10)
    async def exchangeRate(self):
        print("Loop Starting!")
        HEADERS = {
            'User-Agent' : "Magic Browser"
        }

        url = 'https://rubyrealms.com/economy/bank'

        async with aiohttp.request("GET", url, headers=HEADERS) as response:
            if response.status == 200:
                #Scrape page content into one variable
                content = await response.text()
                #Initialize soup
                soup = BeautifulSoup(content, "html.parser")
                #Request access to site
                page = requests.get(url)
                #Declaring "tree" - Used to scrape by XPATH
                tree = html.fromstring(page.content)
                stuff = tree.xpath('//*[@id="content-wrap"]/div[3]/div[3]/div[2]/div[1]/div[2]/div[1]/div[2]/div[2]/h4')
                print(stuff)

            else:
                print(f"The request was invalid\nStatus code: {response.status}")

This is my task loop for Discord.Py ReWrite, basically every 10 seconds it gets access to the site.这是我的 Discord.Py ReWrite 任务循环,基本上每 10 秒它就会访问该站点。 As shown the following code works, just besides:如图所示,以下代码有效,除此之外:

stuff = tree.xpath('//*[@id="content-wrap"]/div[3]/div[3]/div[2]/div[1]/div[2]/div[1]/div[2]/div[2]/h4')
print(stuff)

The only thing it prints is "Loop Starting."它唯一打印的是“Loop Starting”。 from the beginning of the loop: With that code above (The long one) I get printed out this:从循环的开始:使用上面的代码(长代码)我打印出这个:

Bot is ready for duty!
Exchange Cog is ready!
Waiting for loop!
Loop Starting!
[]

What I want to be displayed is:我要显示的是:

Bot is ready for duty!
Exchange Cog is ready!
Waiting for loop!
Loop Starting!
243

(That number changes every day, that's why I can't just use it once.) (这个数字每天都在变化,这就是为什么我不能只用一次。)

If anyone knows how I would be able to work this out, please help.如果有人知道我将如何解决这个问题,请帮忙。 Thank you in advance.先感谢您。

The tree has 7 <h4> tags that meet the description in your comment.tree有 7 个<h4>标签符合您评论中的描述。 If I understand you correctly, in order to get all 7, you can use this:如果我理解正确的话,为了得到全部 7 个,你可以使用这个:

stuff = tree.xpath('//h4[@data-toggle="tooltip"]')
for s in stuff:
    print(s.text)

The output is: output 是:

246
2
7
16
1
1
1

If you know ahead of time that your target number (like 246 in this tree ) is always the first, you can even shorten this to:如果您提前知道您的目标编号(如这tree中的246 )始终是第一个,您甚至可以将其缩短为:

stuff = tree.xpath('//h4[@data-toggle="tooltip"]')[0]
print(stuff.text)

and the output will be: output 将是:

246

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM