使用lxml和xpath抓取网站后出现数据类型问题

Question

I'm scraping a website for data and end up pulling out numbers. 我正在抓取一个网站以获取数据，最终提取出数字。 The issue is when I try to perform logic functions in Python on the data it comes back as 问题是当我尝试在Python中对返回的数据执行逻辑功能时

class 'lxml.etree._ElementStringResult'

My question is can I typecast this data somehow into a string or int so I can then do my logic statements? 我的问题是我可以以某种方式将这些数据类型转换为字符串或整数，以便随后执行逻辑语句吗？

Here is the code: 这是代码：

callType = item.xpath('.//span[contains(@id, "lblSignal")]')[0].text_content()

print callType

Here is the output: 这是输出：

When I try control statements on the data nothing happens. 当我尝试对数据执行控制语句时，什么也没有发生。 I think it's because I'm trying logic on incorrect types. 我认为这是因为我正在尝试对错误类型进行逻辑处理。

callType = item.xpath('.//span[contains(@id, "lblSignal")]')[0].text_content()
print type(callType)
print callType

This is my output: 这是我的输出：

<class 'lxml.etree._ElementStringResult'>
76

So instead of trying to complete control statements with an "int", it is a different type. 因此，它不是尝试使用“ int”完成控制语句，而是另一种类型。 I've tried typecasting the variable but it remains that same datatype. 我尝试过类型转换变量，但它仍然是相同的数据类型。 Hope this helps... 希望这可以帮助...

Answer 1

xpath() may return a list of _ElementStringResult s, not plain Python strings. xpath()可能会返回_ElementStringResult的列表，而不是纯Python字符串。 The reason why you might sometimes wish to have _ElementStringResult s is that unlike str s they remember their parents (which they make accessible through the getparent method). 有时您可能希望拥有_ElementStringResult的原因是，与str不同，他们记得自己的父母（他们可以通过getparent方法访问它们）。

You could convert this to a string or integer by simply passing the object to str or int . 您可以通过将对象简单地传递给str或int将其转换为字符串或整数。

for span in item.xpath('.//span[contains(@id, "lblSignal")]'):
    callType = int(span.text_content())

使用lxml和xpath抓取网站后出现数据类型问题

问题描述

1 个解决方案

解决方案1
5 已采纳 2015-03-18 19:14:03

使用lxml和xpath抓取网站后出现数据类型问题

问题描述

1 个解决方案

解决方案1 5 已采纳 2015-03-18 19:14:03

解决方案1
5 已采纳 2015-03-18 19:14:03