使用lxml和python请求进行爬取。

Question

Okay, I am at it again and really trying to figure this stuff out with lxml and python. 好的，我又来了，真的尝试用lxml和python弄清楚这些东西。 The last time I asked a question I was using xpath and had to figure out how to make a change in case that the direct xpath source itself would change. 上次我问一个问题时，我使用的是xpath，不得不弄清楚如何进行更改，以防直接xpath源本身发生更改。 I have edited my code to try to go after the class instead. 我已经编辑了代码以尝试去上课。 I keep running into a problem with it pulling the address up in memory and not the text that I want. 我一直遇到问题，它在内存中拉出了地址，而不是我想要的文本。 Before anyone says there is a library for what I want to do, this is not about that but, rather, allowing me to understand this code. 在任何人说有一个我想做的事情的库之前，这不是关于此的事情，而是让我理解这段代码。 Here is what I have so far but when I print it out I get an error and I can add [0] behind the print[0].text but it still give me nothing. 这是到目前为止的内容，但是当我打印出来时出现错误，可以在print[0].text后面添加[0]，但仍然没有任何效果。 Any help would be cool. 任何帮助都会很酷。

from lxml import html
import requests
import time



while True:
page = requests.get('https://markets.businessinsider.com/index/realtime-chart/dow_jones')
content = html.fromstring(page.content)
#This will create a list of prices:
prices = content.find_class('price')


print(prices.text)
time.sleep(.5)

Answer 1

Probably a formatting issue from posting but your while loop is not indented. 过帐可能是格式问题，但您的while循环未缩进。

Try my code below: 在下面尝试我的代码：

while True:
    page = requests.get('https://markets.businessinsider.com/index/realtime-chart/dow_jones')
    content = html.fromstring(page.content)
    prices = content.find_class('price')
    #You need to access the 'text_content' method
    text = [p.text_content() for p in prices]
    for t in text:
        if not t.startswith(r"\"):  # Prevents the multiple blank lines
           print(t)
    time.sleep(0.5)

使用lxml和python请求进行爬取。

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-08-06 04:17:45

使用lxml和python请求进行爬取。

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-08-06 04:17:45

解决方案1
1 已采纳 2018-08-06 04:17:45