简体   繁体   中英

How to extract data in one FOR loop from JSON multiple dictionary - Python

In my scrapy project I want to extract data from a website. It turned out that all information are stored in some script that I can easily read in JSON format and from there extract the data I need.

That's my function:

    def parse(self, response):
        items = response.css("script:contains('window.__INITIAL_STATE__')::text").re_first(r"window\.__INITIAL_STATE__ =(.*);")
        for item in json.loads(items)['offers']:
            yield {
                "title": item['jobTitle'],
                "employer": item['employer'],
                "country": item['countryName'],
                "details_page": item['companyProfileUrl'],
                "expiration_date": item['expirationDate'],
                'salary': item['salary'],
                'employmentLevel': item['employmentLevel'],
            }

And json file have that structure:

var = {
    "offers":[
      {
        "commonOfferId":"1200072247",
        "jobTitle":"Automatyk - Programista",
        "employer":"MULTIPAK Spółka Akcyjna",
        "companyProfileUrl":"https://pracodawcy.pracuj.pl/company/20379037/profile",
        "expirationDate":"2021-04-28T12:47:06.273",
        "salary":"",
        "employmentLevel":"Specjalista (Mid / Regular)" ,
        "offers": [
                {
            "offerId":500092126,
            "regionName":"kujawsko-pomorskie",
            "cities":["Małe Czyste (pow. chełmiński)"],
            "label":"Małe Czyste (pow. chełmiński)"}], 

Above example of one element. So when I try to extract data like cities or regioName I receive an error. How can I make for loop from throughout two dictionaries and yield that data date to the new dictionary?

You didn't make it clear what you want, but I'm guessing this is close:

    def parse(self, response):
        items = response.css("script:contains('window.__INITIAL_STATE__')::text").re_first(r"window\.__INITIAL_STATE__ =(.*);")
        for item in json.loads(items)['offers']:
            for offer in item['offers']:
                yield {
                    "title": item['jobTitle'],
                    "employer": item['employer'],
                    "country": item['countryName'],
                    "details_page": item['companyProfileUrl'],
                    "expiration_date": item['expirationDate'],
                    'salary': item['salary'],
                    'employmentLevel': item['employmentLevel'],
                    'offernumber': offer['offerId'],
                    'region': offer['regionName'],
                    'city': offer['cities'][0]
                }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM