获取 AttributeError 错误 'str' object has no attribute 'get'

Question

I am getting an error while working with JSON response:使用 JSON 响应时出现错误：

Error: AttributeError: 'str' object has no attribute 'get'

What could be the issue?可能是什么问题？

I am also getting the following errors for the rest of the values:对于 rest 的值，我也收到以下错误：

*** TypeError: 'builtin_function_or_method' object is not subscriptable *** TypeError: 'builtin_function_or_method' object 不可订阅

'Phone': value['_source']['primaryPhone'], KeyError: 'primaryPhone' *** '电话'：值['_source'] ['primaryPhone']，KeyError：'primaryPhone' ***

# -*- coding: utf-8 -*-
import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'main'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

def parse(self, response):

    resp = json.loads(response.body)
    values = resp['hits']['hits']

    for value in values:

        yield {
            'Full Name': value['_source']['fullName'],
            'Phone': value['_source']['primaryPhone'],
            "Email": value['_source']['primaryEmail'],
            "City": value.get['_source']['city'],
            "Zip Code": value.get['_source']['zipcode'],
            "Website": value['_source']['websiteURL'],
            "Facebook": value['_source']['facebookURL'],
            "LinkedIn": value['_source']['LinkedIn_URL'],
            "Twitter": value['_source']['Twitter'],
            "BIO": value['_source']['Bio']
        }

Answer 1

It's nested deeper than what you think it is.它嵌套得比你想象的要深。 That's why you're getting an error.这就是你收到错误的原因。

Code Example代码示例

import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

    def parse(self, response):
        resp = json.loads(response.body)
        values = resp['hits']['hits']

        for value in values:
            yield {
                'Full Name': value['_source']['fullName'],
                'Primary Phone':value['_source']['primaryPhone']
            }

Explanation解释

The resp variable is creating a python dictionary, but there is no resp['hits']['hits']['fullName'] within this JSON data. resp 变量正在创建一个 python 字典，但是在这个 JSON 数据中没有resp['hits']['hits']['fullName'] 。 The data you're looking for, for fullName is actually resp['hits']['hits'][i]['_source']['fullName'] .您要查找的 fullName 数据实际上是resp['hits']['hits'][i]['_source']['fullName'] 。 i being an number because resp['hits']['hits'] is a list. i是一个数字，因为resp['hits']['hits']是一个列表。

resp['hits'] is a dictionary and therefore the values variable is fine. resp['hits']是一个字典，因此values变量很好。 But resp['hits']['hits'] is a list, therefore you can't use the get request, and it's only accepts numbers as values within [], not strings.但是resp['hits']['hits']是一个列表，因此您不能使用 get 请求，它只接受数字作为 [] 中的值，而不接受字符串。 Hence the error.因此错误。

Tips尖端

Use response.json() instead of json.loads(response.body), since Scrapy v2.2, scrapy now has support for json internally.使用 response.json() 而不是 json.loads(response.body)，自 Scrapy v2.2 以来，scrapy 现在在内部支持 json。 Behind the scenes it already imports json.在幕后，它已经导入了 json。
Also check the json data, I used requests for ease and just getting nesting down till I got the data you needed.还要检查 json 数据，我使用 requests 来轻松地进行嵌套，直到我得到你需要的数据。
Yielding a dictionary is fine for this type of data as it's well structured, but any other data that needs modifying or changing or is wrong in places.生成字典对于这种类型的数据很好，因为它结构良好，但任何其他需要修改或更改或在某些地方出错的数据。 Use either Items dictionary or ItemLoader.使用 Items 字典或 ItemLoader。 There's a lot more flexibility in those two ways of yielding an output than yielding a dictionary.这两种生成 output 的方法比生成字典要灵活得多。 I almost never yield a dictionary, the only time is when you have highly structured data.我几乎从不产生字典，唯一的一次是当你有高度结构化的数据时。

Updated Code更新代码

Looking at the JSON data, there are quite a lot of missing data.查看JSON数据，缺失的数据还挺多的。 This is part of web scraping you will find errors like this.这是 web 抓取的一部分，你会发现这样的错误。 Here we use a try and except block, for when we get a KeyError which means python hasn't been able to recognise the key associated with a value.这里我们使用 try 和 except 块，因为当我们得到 KeyError 时，这意味着 python 无法识别与值关联的键。 We have to handle that exception, which we do here by saying to yield a string 'No XXX'我们必须处理该异常，我们在这里通过生成字符串 'No XXX' 来处理

Once you start getting gaps etc it's better to consider an Items dictionary or Itemloaders.一旦开始出现差距等，最好考虑使用 Items 字典或 Itemloader。

Now it's worth looking at the Scrapy docs about Items.现在值得看看关于项目的 Scrapy 文档。 Essentially Scrapy does two things, it extracted data from websites, and it provides a mechanism for storing this data.本质上 Scrapy 做了两件事，它从网站提取数据，并提供了一种存储这些数据的机制。 The way it does this is storing it in a dictionary called Items.它执行此操作的方法是将其存储在名为 Items 的字典中。 The code isn't that much different from yielding a dictionary but Items dictionary allows you to manipulate the extracted data more easily with extra things scrapy can do.该代码与生成字典没有太大区别，但 Items 字典允许您使用 scrapy 可以做的额外事情更轻松地操作提取的数据。 You need to edit your items.py first with the fields you want.您需要先使用所需的字段编辑 items.py。 We create a class called TestItem, we define each field using scrapy.Field().我们创建一个名为 TestItem 的 class，我们使用 scrapy.Field() 定义每个字段。 We then can import this class in our spider script.然后我们可以在我们的爬虫脚本中导入这个 class。

items.py项目.py

import scrapy


class TestItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    full_name = scrapy.Field()
    Phone = scrapy.Field()
    Email = scrapy.Field()
    City = scrapy.Field()
    Zip_code = scrapy.Field()
    Website = scrapy.Field()
    Facebook = scrapy.Field()
    Linkedin = scrapy.Field()
    Twitter = scrapy.Field()
    Bio = scrapy.Field()

Here we're specifying what we want the fields to be, you can't use a string with spaces unfortunately hence why full name is full_name.这里我们指定了我们想要的字段，不幸的是你不能使用带空格的字符串，因此全名是 full_name。 The field() creates the field of the item dictionary for us. field() 为我们创建了项目字典的字段。

We import this item dictionary into our spider script with from..items import TestItem .我们使用from..items import TestItem将这个项目字典导入到我们的蜘蛛脚本中。 The from..items means we're taking the items.py from the parent folder to the spider script and we're importing the class TestItem. from..items意味着我们将 items.py 从父文件夹带到蜘蛛脚本，我们正在导入 class TestItem。 That way our spider can populate the items dictionary with our json data.这样我们的蜘蛛就可以用我们的 json 数据填充 items 字典。

Note that just before the for loop we instantiate the class TestItem by item = TestItem().请注意，就在 for 循环之前，我们通过 item = TestItem() 实例化了 class TestItem。 Instantiate means to call upon the class, in this case it makes a dictionary. Instantiate 意味着调用 class，在本例中它创建了一个字典。 This means we are creating the item dictionary and then we populate that dictionary with keys and values.这意味着我们正在创建项目字典，然后我们用键和值填充该字典。 You have to does this before you add your keys and values as you can see from within the for loop.正如您在 for 循环中看到的那样，您必须在添加键和值之前执行此操作。

Spider script蜘蛛脚本

import scrapy
import json
from ..items import TestItem

class MainSpider(scrapy.Spider):
   name = 'test'
   start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

   def parse(self, response):
       resp = json.loads(response.body)
       values = response.json()['hits']['hits']
       item = TestItem()
       for value in values:
        try:
            item['full_name'] = value['_source']['fullName']
        except KeyError:
            item['full_name'] = 'No Name'
        try:
            item['Phone'] = value['_source']['primaryPhone']
        except KeyError:
            item['Phone'] = 'No Phone number'
        try:
            item["Email"] =  value['_source']['primaryEmail']
        except KeyError:
            item['Email'] = 'No Email'
        try:
            item["City"] = value['_source']['activeLocations'][0]['city']
        except KeyError:
            item['City'] = 'No City'
        try:
             item["Zip_code"] = value['_source']['activeLocations'][0]['zipcode']
        except KeyError:
            item['Zip_code'] = 'No Zip code'
                
        try:
            item["Website"] = value['AgentMarketingCenter'][0]['Website']
        except KeyError:
            item['Website'] = 'No Website'
               
        try:
            item["Facebook"] = value['_source']['AgentMarketingCenter'][0]['Facebook_URL']
        except KeyError:
            item['Facebook'] = 'No Facebook'
                
        try:
            item["Linkedin"] = value['_source']['AgentMarketingCenter'][0]['LinkedIn_URL']
        except KeyError:
            item['Linkedin'] = 'No Linkedin'    
        try:
            item["Twitter"] = value['_source']['AgentMarketingCenter'][0]['Twitter']
        except KeyError:
            item['Twitter'] = 'No Twitter'
        
        try:
             item["Bio"]: value['_source']['AgentMarketingCenter'][0]['Bio']
        except KeyError:
            item['Bio'] = 'No Bio'
               
        yield item

获取 AttributeError 错误 'str' object has no attribute 'get'

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-17 18:31:54

Code Example代码示例

Explanation解释

Tips尖端

Updated Code更新代码

items.py项目.py

Spider script蜘蛛脚本

获取 AttributeError 错误 'str' object has no attribute 'get'

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-17 18:31:54

Code Example代码示例

Explanation解释

Tips尖端

Updated Code更新代码

items.py项目.py

Spider script蜘蛛脚本

解决方案1
1 已采纳 2020-07-17 18:31:54