简体   繁体   中英

Nested JSON flattening in Pandas

I am trying to load Nested JSON in different columns of Pandas Dataframe. Currently its present in single column.

I tried using.apply(pd.Series).Please suggest some better approach for extraction of Choice_id,row_id,heading & simple_text

Sample JSON in column

[
  {
    'id': '471362124',
    'answers': [
      {
        'choice_id': '3114700249',
        'row_id': '3114700251',
        'simple_text': 'Delivery behaviour | 7'
      },
      {
        'choice_id': '3114700249',
        'row_id': '3114700254',
        'simple_text': 'Customer Care (Chat/Email/Helpline Toll free) | 7'
      },
      {
        'choice_id': '3114700250',
        'row_id': '3114700255',
        'simple_text': 'Pricing | 6'
      },
      {
        'choice_id': '3114700250',
        'row_id': '3114700257',
        'simple_text': ' products | 6'
      },
      {
        'choice_id': '3114700249',
        'row_id': '3114700259',
        'simple_text': 'Branded products | 7'
      }
    ],
    'family': 'matrix',
    'subtype': 'rating',
    'heading': 'Dear customer how much would you rate us, on the following parameters, on a scale of 0~10 (10 being the highest and 0 being lowest)'
  },
  {
    'id': '471362122',
    'answers': [
      {
        'tag_data': [
          {
            'hexcolor': '00BF6F',
            'label': 'sm_negative',
            'tag_type': 'sentiment'
          }
        ],
        'simple_text': 'Vegetable, fruit of poor quality. By the time 5-7% items for which order are accepted are not supplied....'
      }
    ],
    'family': 'open_ended',
    'subtype': 'essay',
    'heading': 'Dear Customer,<br>Kindly help us, by providing your valuable suggestions .\xa0'
  }
]

I tried using:.apply(pd.Series)

delivery_executive_behaviour=page2_questions_choice_id_answer[0].apply(pd.Series)
customer_care=page2_questions_choice_id_answer[1].apply(pd.Series)
product_pricing=page2_questions_choice_id_answer[2].apply(pd.Series)
ssortment_of_products=page2_questions_choice_id_answer[3].apply(pd.Series)

Is this what you want? Dict comprehension is sometimes more efficient than pandas:

doc = the list you provide
res = [
    {
        'choice_id': j['choice_id'] if 'choice_id' in j else None,
        'row_id': j['row_id'] if 'row_id' in j else None,
        'heading': i['heading'],
        'simple_text': j['simple_text'] if 'simple_text' in j else None}
    for i in doc for j in i['answers']]
import pandas as pd
pd.DataFrame(res)

Out[2]: 
    choice_id      row_id                                            heading                                        simple_text
0  3114700249  3114700251  Dear customer how much would you rate us, on t...                             Delivery behaviour | 7
1  3114700249  3114700254  Dear customer how much would you rate us, on t...  Customer Care (Chat/Email/Helpline Toll free) | 7
2  3114700250  3114700255  Dear customer how much would you rate us, on t...                                        Pricing | 6
3  3114700250  3114700257  Dear customer how much would you rate us, on t...                                       products | 6
4  3114700249  3114700259  Dear customer how much would you rate us, on t...                               Branded products | 7
5        None        None  Dear Customer,<br>Kindly help us, by providing...  Vegetable, fruit of poor quality. By the time ...

You can use json_normalize here:

d = [
  {
    'id': '471362124',
    'answers': [
      {
        'choice_id': '3114700249',
        'row_id': '3114700251',
        'simple_text': 'Delivery behaviour | 7'
      },
      {
        'choice_id': '3114700249',
        'row_id': '3114700254',
        'simple_text': 'Customer Care (Chat/Email/Helpline Toll free) | 7'
      },
      {
        'choice_id': '3114700250',
        'row_id': '3114700255',
        'simple_text': 'Pricing | 6'
      },
      {
        'choice_id': '3114700250',
        'row_id': '3114700257',
        'simple_text': ' products | 6'
      },
      {
        'choice_id': '3114700249',
        'row_id': '3114700259',
        'simple_text': 'Branded products | 7'
      }
    ],
    'family': 'matrix',
    'subtype': 'rating',
    'heading': 'Dear customer how much would you rate us, on the following parameters, on a scale of 0~10 (10 being the highest and 0 being lowest)'
  },
  {
    'id': '471362122',
    'answers': [
      {
        'tag_data': [
          {
            'hexcolor': '00BF6F',
            'label': 'sm_negative',
            'tag_type': 'sentiment'
          }
        ],
        'simple_text': 'Vegetable, fruit of poor quality. By the time 5-7% items for which order are accepted are not supplied....'
      }
    ],
    'family': 'open_ended',
    'subtype': 'essay',
    'heading': 'Dear Customer,<br>Kindly help us, by providing your valuable suggestions .\xa0'
  }
]

df = pd.json_normalize(d, record_path=['answers'], meta=[['id'], ['heading']]).drop(columns=['tag_data'])
print(df)


   choice_id      row_id                                        simple_text         id                                            heading
0  3114700249  3114700251                             Delivery behaviour | 7  471362124  Dear customer how much would you rate us, on t...
1  3114700249  3114700254  Customer Care (Chat/Email/Helpline Toll free) | 7  471362124  Dear customer how much would you rate us, on t...
2  3114700250  3114700255                                        Pricing | 6  471362124  Dear customer how much would you rate us, on t...
3  3114700250  3114700257                                       products | 6  471362124  Dear customer how much would you rate us, on t...
4  3114700249  3114700259                               Branded products | 7  471362124  Dear customer how much would you rate us, on t...
5         NaN         NaN  Vegetable, fruit of poor quality. By the time ...  471362122  Dear Customer,<br>Kindly help us, by providing...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM