简体   繁体   中英

How can I import a nested json object into a pandas dataframe?

I have a json object like this:

[{'currency_pair': 'UOS_USDT',
  'orders': [{'account': 'spot',
              'amount': '1282.84',
              'create_time': '1655394430',
              'create_time_ms': 1655394430129,
              'currency_pair': 'UOS_USDT',
              'fee': '0',
              'fee_currency': 'UOS',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208865523',
              'left': '1282.84',
              'point_fee': '0',
              'price': '0.1949',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394430',
              'update_time_ms': 1655394430129}],
  'total': 1},
 {'currency_pair': 'RMRK_USDT',
  'orders': [{'account': 'spot',
              'amount': '79.365',
              'create_time': '1655394431',
              'create_time_ms': 1655394431249,
              'currency_pair': 'RMRK_USDT',
              'fee': '0',
              'fee_currency': 'RMRK',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208877018',
              'left': '79.365',
              'point_fee': '0',
              'price': '2.52',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394431',
              'update_time_ms': 1655394431249}],
  'total': 1}]

I want to convert it to a dataframe.

The data comes from an api call to a crypto exchange. I converted this to json, using the .json() method. So it's proper json. I have tried:

df = pd.DataFrame(data)
df = pd.DataFrame(data["orders")
df = pd.DataFrame(data["currency_pair"]["orders"])

and every other imaginable path.

I want a df which has as columns ["currency_pair", "amount", "create_time", "price", "side"]

I some times get an error TypeError: list indices must be integers or slices, not str or the df works but the orders object is not unpacked. All help gratefully received. Thank you.

import pandas as pd

data = [{'currency_pair': 'UOS_USDT',
  'orders': [{'account': 'spot',
              'amount': '1282.84',
              'create_time': '1655394430',
              'create_time_ms': 1655394430129,
              'currency_pair': 'UOS_USDT',
              'fee': '0',
              'fee_currency': 'UOS',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208865523',
              'left': '1282.84',
              'point_fee': '0',
              'price': '0.1949',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394430',
              'update_time_ms': 1655394430129}],
  'total': 1},
 {'currency_pair': 'RMRK_USDT',
  'orders': [{'account': 'spot',
              'amount': '79.365',
              'create_time': '1655394431',
              'create_time_ms': 1655394431249,
              'currency_pair': 'RMRK_USDT',
              'fee': '0',
              'fee_currency': 'RMRK',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208877018',
              'left': '79.365',
              'point_fee': '0',
              'price': '2.52',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394431',
              'update_time_ms': 1655394431249}],
  'total': 1}]

df = pd.json_normalize(data, record_path=['orders'])

And keep the columns you need.

It's only one line and it should cover your case since 'currency_pair' that you want is already in the 'orders' dictionary and from what I understand from your data it will always be the same as the 'currency_pair' value outside 'orders. As you said you don't need 'total' too.

Use:

df = pd.json_normalize(data, record_path=['orders'], meta=['currency_pair', 'total'], record_prefix='orders_')

If you want them all

import pandas as pd

data = [{'currency_pair': 'UOS_USDT',
  'orders': [{'account': 'spot',
              'amount': '1282.84',
              'create_time': '1655394430',
              'create_time_ms': 1655394430129,
              'currency_pair': 'UOS_USDT',
              'fee': '0',
              'fee_currency': 'UOS',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208865523',
              'left': '1282.84',
              'point_fee': '0',
              'price': '0.1949',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394430',
              'update_time_ms': 1655394430129}],
  'total': 1},
 {'currency_pair': 'RMRK_USDT',
  'orders': [{'account': 'spot',
              'amount': '79.365',
              'create_time': '1655394431',
              'create_time_ms': 1655394431249,
              'currency_pair': 'RMRK_USDT',
              'fee': '0',
              'fee_currency': 'RMRK',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208877018',
              'left': '79.365',
              'point_fee': '0',
              'price': '2.52',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394431',
              'update_time_ms': 1655394431249}],
  'total': 1}]

df = pd.DataFrame(data)
df['amount'] = df.apply( lambda row: row.orders[0]['amount'] , axis=1)
df['create_time'] = df.apply( lambda row: row.orders[0]['create_time'] , axis=1)
df['price'] = df.apply( lambda row: row.orders[0]['price'] , axis=1)
df['side'] = df.apply( lambda row: row.orders[0]['side'] , axis=1)
required_df = df[['currency_pair', 'amount', 'create_time', 'price', 'side']]
required_df

Result:

currency_pair   amount  create_time     price   side
0   UOS_USDT    1282.84     1655394430  0.1949  buy
1   RMRK_USDT   79.365  1655394431  2.52    buy

HI, hope this process can help you

#Import pandas library
import pandas as pd

#Your data 
data = [{'currency_pair': 'UOS_USDT',
         'orders': [{'account': 'spot',
                     'amount': '1282.84',
                     'create_time': '1655394430',
                     'create_time_ms': 1655394430129,
                     'currency_pair': 'UOS_USDT',
                     'fee': '0',
                     'fee_currency': 'UOS',
                     'fill_price': '0',
                     'filled_total': '0',
                     'gt_discount': False,
                     'gt_fee': '0',
                     'iceberg': '0',
                     'id': '169208865523',
                     'left': '1282.84',
                     'point_fee': '0',
                     'price': '0.1949',
                     'rebated_fee': '0',
                     'rebated_fee_currency': 'USDT',
                     'side': 'buy',
                     'status': 'open',
                     'text': 'apiv4',
                     'time_in_force': 'gtc',
                     'type': 'limit',
                     'update_time': '1655394430',
                     'update_time_ms': 1655394430129}],
         'total': 1},
        {'currency_pair': 'RMRK_USDT',
         'orders': [{'account': 'spot',
                     'amount': '79.365',
                     'create_time': '1655394431',
                     'create_time_ms': 1655394431249,
                     'currency_pair': 'RMRK_USDT',
                     'fee': '0',
                     'fee_currency': 'RMRK',
                     'fill_price': '0',
                     'filled_total': '0',
                     'gt_discount': False,
                     'gt_fee': '0',
                     'iceberg': '0',
                     'id': '169208877018',
                     'left': '79.365',
                     'point_fee': '0',
                     'price': '2.52',
                     'rebated_fee': '0',
                     'rebated_fee_currency': 'USDT',
                     'side': 'buy',
                     'status': 'open',
                     'text': 'apiv4',
                     'time_in_force': 'gtc',
                     'type': 'limit',
                     'update_time': '1655394431',
                     'update_time_ms': 1655394431249}],
         'total': 1}]


#Accessing nested values
#you cloud transform the specific column 
#into a DataFrame and access it values with indices
#then parse the value to the type you need 
#i.e
float(pd.DataFrame(data[0]['orders'])['amount'].values[0])
int(pd.DataFrame(data[0]['orders'])['create_time'].values[0])
float(pd.DataFrame(data[0]['orders'])['price'].values[0])
pd.DataFrame(data[0]['orders'])['side'].values[0]

#Create a dictionary with your chosen structure
#["currency_pair", "amount", "create_time", "price", "side"]
# then insert the corresponding columns

custom_dictionary = {
    'currency_pair': [data[0]['currency_pair'], data[1]['currency_pair']],

    'amount': [float(pd.DataFrame(data[0]['orders'])['amount'].values[0]),
               float(pd.DataFrame(data[1]['orders'])['amount'].values[0])],

    'create_time': [int(pd.DataFrame(data[0]['orders'])['create_time'].values[0]),
                    int(pd.DataFrame(data[1]['orders'])['create_time'].values[0])],

    'price': [float(pd.DataFrame(data[0]['orders'])['price'].values[0]),
              float(pd.DataFrame(data[1]['orders'])['price'].values[0])],

    'side': [pd.DataFrame(data[0]['orders'])['side'].values[0],
             pd.DataFrame(data[1]['orders'])['side'].values[0]]}

#Create a DataFrame with your custom dictionary and voila
df = pd.DataFrame(custom_dictionary)
df

the dataframe (df) could look like:

custom DataFrame

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM