简体   繁体   English

使用lxml和xpath抓取网站后出现数据类型问题

[英]Trouble with data types after scraping a website with lxml and xpath

I'm scraping a website for data and end up pulling out numbers. 我正在抓取一个网站以获取数据,最终提取出数字。 The issue is when I try to perform logic functions in Python on the data it comes back as 问题是当我尝试在Python中对返回的数据执行逻辑功能时

class 'lxml.etree._ElementStringResult'

My question is can I typecast this data somehow into a string or int so I can then do my logic statements? 我的问题是我可以以某种方式将这些数据类型转换为字符串或整数,以便随后执行逻辑语句吗?

Here is the code: 这是代码:

callType = item.xpath('.//span[contains(@id, "lblSignal")]')[0].text_content()

print callType

Here is the output: 这是输出:

76

When I try control statements on the data nothing happens. 当我尝试对数据执行控制语句时,什么也没有发生。 I think it's because I'm trying logic on incorrect types. 我认为这是因为我正在尝试对错误类型进行逻辑处理。

callType = item.xpath('.//span[contains(@id, "lblSignal")]')[0].text_content()
print type(callType)
print callType

This is my output: 这是我的输出:

<class 'lxml.etree._ElementStringResult'>
76

So instead of trying to complete control statements with an "int", it is a different type. 因此,它不是尝试使用“ int”完成控制语句,而是另一种类型。 I've tried typecasting the variable but it remains that same datatype. 我尝试过类型转换变量,但它仍然是相同的数据类型。 Hope this helps... 希望这可以帮助...

xpath() may return a list of _ElementStringResult s, not plain Python strings. xpath()可能会返回_ElementStringResult的列表,而不是纯Python字符串。 The reason why you might sometimes wish to have _ElementStringResult s is that unlike str s they remember their parents (which they make accessible through the getparent method). 有时您可能希望拥有_ElementStringResult的原因是,与str不同,他们记得自己的父母(他们可以通过getparent方法访问它们)。

You could convert this to a string or integer by simply passing the object to str or int . 您可以通过将对象简单地传递给strint将其转换为字符串或整数。

for span in item.xpath('.//span[contains(@id, "lblSignal")]'):
    callType = int(span.text_content())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM