简体   繁体   中英

How to convert lists of nested dictionaries to DataFrame

I'm trying to convert the files from the following link: https://ads.twitter.com/transparency

Into a DataFrame.

This is how the data looks like:

{
  "archives" : [ {
    "ads_account" : {
      "account_name" : "@BradleyByrne - U.S. Political Campaigning",
      "user_name" : "BradleyByrne",
      "bio_url" : "https://twitter.com/ZpdrcK6Met",
      "billing_information" : {
        "insertion_order" : [ ],
        "credit_card" : [ {
          "city" : "Arlington",
          "spend" : 3.5845999999999995E-4,
          "postal_code" : "22209",
          "region" : "va",
          "credit_card_full_name" : "Targeted Victory"
        } ]
      }
    },
    "tweets" : [ {
      "impressions" : 0,
      "spend" : 0.0,
      "ad_campaigns" : [ {
        "targeting" : [ {
          "target" : "Montgomery AL- US",
          "target_type" : "GEO",
          "impressions" : 895
        }, {
          "target" : "13-54",
          "target_type" : "AGE_BUCKET",
          "impressions" : 5721
        }, {
          "target" : "Dothan AL- US",
          "target_type" : "GEO",
          "impressions" : 189
        }, {
          "target" : "13-29",
          "target_type" : "AGE_BUCKET",
          "impressions" : 3009
        }, {
          "target" : "Chattanooga TN- US",
          "target_type" : "GEO",
          "impressions" : 2
        }, {
          "target" : "English",
          "target_type" : "LANGUAGE",
          "impressions" : 8568
        }, {
          "target" : "Orlando-Daytona Beach-Melbourne FL- US",
          "target_type" : "GEO",
          "impressions" : 13
        }, {
          "target" : "21-54",
          "target_type" : "AGE_BUCKET",
          "impressions" : 4297
        }, {
          "target" : "Thai",
          "target_type" : "LANGUAGE",
          "impressions" : 1
        }, {
          "target" : "20 and up",
          "target_type" : "AGE_BUCKET",
          "impressions" : 6598
        },


"ads_account" : {
  "account_name" : "@club4growth - U.S. Political Campaigning - Bask Digital Media",
  "user_name" : "club4growth",
  "bio_url" : "http://twitter.com/wEF8OWW5zn",
  "billing_information" : {
    "insertion_order" : [ ],
    "credit_card" : [ ]
  }
},
"tweets" : [ {
  "impressions" : 466501,
  "spend" : 2993.5,
  "ad_campaigns" : [ {
    "targeting" : [ {
      "target" : "13 and up",
      "target_type" : "AGE_BUCKET",
      "impressions" : 144460
    }, {
      "target" : "20-34",
      "target_type" : "AGE_BUCKET",
      "impressions" : 78242
    }, {
      "target" : "Korean",
      "target_type" : "LANGUAGE",
      "impressions" : 160
    }, {
      "target" : "13-54",
      "target_type" : "AGE_BUCKET",
      "impressions" : 131703
    }, {
      "target" : "30-39",
      "target_type" : "AGE_BUCKET",
      "impressions" : 42685
    }, {
      "target" : "Pennsylvania- US",
      "target_type" : "GEO",
      "impressions" : 2
    }, {
      "target" : "25-54",
      "target_type" : "AGE_BUCKET",
      "impressions" : 86998
    }, {
      "target" : "South Dakota- US",
      "target_type" : "GEO",
      "impressions" : 1
    }, {
      "target" : "20-29",
      "target_type" : "AGE_BUCKET",
      "impressions" : 61090
    }, {
      "target" : "Dutch",
      "target_type" : "LANGUAGE",
      "impressions" : 41
    }, {
      "target" : "Unknown",
      "target_type" : "GENDER",
      "impressions" : 214
    }, {
      "target" : "Washington DC- US",
      "target_type" : "GEO",
      "impressions" : 144356
    }, {
      "target" : "French",
      "target_type" : "LANGUAGE",
      "impressions" : 420
    }, {
      "target" : "German",
      "target_type" : "LANGUAGE",
      "impressions" : 71
    }, {
      "target" : "New Jersey- US",
      "target_type" : "GEO",
      "impressions" : 1
    }, {
      "target" : "Female",
      "target_type" : "GENDER",
      "impressions" : 57736
    },

It looks like each advertiser has its own nested dictionaries and I didn't find a way to convert them into a DataFrame. I tried the following code to convert it, but it's just separate them into different columns.

Any solution? Thanks

import json
from pandas.io.json import json_normalize
file = 'issue.txt'
with open(file) as train_file:
    dict_train = json.load(train_file)


train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)
train

You can try this using json_normalize , you need to create separate dataframes for each json path and then you'd have to merge them together or keep them separate:

df1 = pd.json_normalize(data['archives'], record_path=['tweets'])
df2 = pd.json_normalize(data['archives'],
                        record_path=['ads_account', 'billing_information', 'insertion_order'],
                        meta=[['ads_account', 'account_name'], ['ads_account', 'user_name']])

df1
df2

Output:

df1:

      impressions      spend  ...                                         tweet_text                                          tweet_url
0          132072    2071.81  ...  There’s nothing controversial about something ...  https://twitter.com/transparency/status/106532...
1         8779581  100000.00  ...  Let’s #endgunviolencetogether - go to https://...  https://twitter.com/transparency/status/106473...
2         1021063   15601.68  ...  There’s nothing controversial about something ...  https://twitter.com/transparency/status/106532...
3         5935913  113991.45  ...  Send a postcard to your representative in less...  https://twitter.com/transparency/status/106504...
4           40233     287.31  ...  Care for Pennsylvania seniors is in jeopardy. ...  https://twitter.com/transparency/status/113887...
...           ...        ...  ...                                                ...                                                ...
2855       115744     760.68  ...  Dear New York politicians: Abortion is health ...  https://twitter.com/transparency/status/108388...
2856       514286    2566.19  ...  In 2019, states have passed more laws than eve...  https://twitter.com/transparency/status/114830...
2857         8247     180.71  ...  Spread the word about Trump's real agenda so t...  https://twitter.com/transparency/status/109297...
2858         4629      24.36  ...  Illinois’ new law, the Reproductive Health Act...  https://twitter.com/transparency/status/113485...
2859         1795       6.38  ...  Congratulations to our #WebbyAwards nominated ...  https://twitter.com/transparency/status/111318...

df2:

    advertising_agency_name                                company_name  ...                 ads_account.account_name ads_account.user_name
0          Resolution Media                             Toms Shoes Inc.  ...             @TOMS - U.S. Issue Ads - OMD                  TOMS
1      Precision Strategies                                      Humana  ...   @humana - Issue - Precision Strategies                Humana
2                       NaN  Federation for American Immigration Reform  ...        @FAIRImmigration - U.S. Issue Ads       FAIRImmigration
3                       NaN                                         VH1  ...                    @VH1 - U.S. Issue Ads                   VH1
4                       NaN                                         VH1  ...                    @VH1 - U.S. Issue Ads                   VH1
..                      ...                                         ...  ...                                      ...                   ...
118             Cavalry LLC               American Hospital Association  ...  @AHAAdvocacy - U.S. Issue Ads - Cavalry           AHAAdvocacy
119                     NaN                                      FWD.us  ...                  @FWDus - U.S. Issue Ads                 FWDus
120                     NaN                                      FWD.us  ...                  @FWDus - U.S. Issue Ads                 FWDus
121                     NaN               California Secretary of State  ...              @CASOSVote - U.S. Issue Ads             CASOSvote
122                     NaN               California Secretary of State  ...              @CASOSVote - U.S. Issue Ads             CASOSvote

please, try pandas.read_json()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM