简体   繁体   中英

python generator to pandas dataframe

I have a generator being returned from:

data = public_client.get_product_trades(product_id='BTC-USD', limit=10)

How do i turn the data in to a pandas dataframe?

the method DOCSTRING reads:

"""{"Returns": [{
                     "time": "2014-11-07T22:19:28.578544Z",
                     "trade_id": 74,
                     "price": "10.00000000",
                     "size": "0.01000000",
                     "side": "buy"
                 }, {
                     "time": "2014-11-07T01:08:43.642366Z",
                     "trade_id": 73,
                     "price": "100.00000000",
                     "size": "0.01000000",
                     "side": "sell"
         }]}"""

I have tried:

df = [x for x in data]
df = pd.DataFrame.from_records(df)

but it does not work as i get the error:

AttributeError: 'str' object has no attribute 'keys'

When i print the above "x for x in data" i see the list of dicts but the end looks strange, could this be why?

print(list(data))

[{'time': '2020-12-30T13:04:14.385Z', 'trade_id': 116918468, 'price': '27853.82000000', 'size': '0.00171515', 'side': 'sell'},{'time': '2020-12-30T12:31:24.185Z', 'trade_id': 116915675, 'price': '27683.70000000', 'size': '0.01683711', 'side': 'sell'}, 'message']

It looks to be a list of dicts but the end value is a single string 'message'.

Based on the updated question:

df = pd.DataFrame(list(data)[:-1])

Or, more cleanly:

df = pd.DataFrame([x for x in data if isinstance(x, dict)])
print(df)

                       time   trade_id           price        size  side
0  2020-12-30T13:04:14.385Z  116918468  27853.82000000  0.00171515  sell
1  2020-12-30T12:31:24.185Z  116915675  27683.70000000  0.01683711  sell

Oh, and BTW, you'll still need to change those strings into something usable...

So eg:

df['time'] = pd.to_datetime(df['time'])
for k in ['price', 'size']:
    df[k] = pd.to_numeric(df[k])

You could access the values in the dictionary and build a dataframe from it (although not particularly clean):

dict_of_data =  [{
                     "time": "2014-11-07T22:19:28.578544Z",
                     "trade_id": 74,
                     "price": "10.00000000",
                     "size": "0.01000000",
                     "side": "buy"
                 }, {
                     "time": "2014-11-07T01:08:43.642366Z",
                     "trade_id": 73,
                     "price": "100.00000000",
                     "size": "0.01000000",
                     "side": "sell"
         }]

import pandas as pd 

list_of_data = [list(dict_of_data[0].values()),list(dict_of_data[1].values())]

pd.DataFrame(list_of_data, columns=list(dict_of_data[0].keys())).set_index('time')

its straightforward just use the pd.DataFrame constructor:

#list_of_dicts = [{
#                     "time": "2014-11-07T22:19:28.578544Z",
#                     "trade_id": 74,
#                     "price": "10.00000000",
#                     "size": "0.01000000",
#                     "side": "buy"
#                 }, {
#                     "time": "2014-11-07T01:08:43.642366Z",
#                     "trade_id": 73,
#                     "price": "100.00000000",
#                     "size": "0.01000000",
#                     "side": "sell"
#}]
# or if you take it from 'data'
list_of_dicts = data[:-1]
df = pd.DataFrame(list_of_dicts)

df
Out[4]: 
                          time  trade_id         price        size  side
0  2014-11-07T22:19:28.578544Z        74   10.00000000  0.01000000   buy
1  2014-11-07T01:08:43.642366Z        73  100.00000000  0.01000000  sell

UPDATE

according to the question update, it seems you have json data that is still string...

import json

data = json.loads(data)
data = data['Returns']
pd.DataFrame(data)

                          time  trade_id         price        size  side
0  2014-11-07T22:19:28.578544Z        74   10.00000000  0.01000000   buy
1  2014-11-07T01:08:43.642366Z        73  100.00000000  0.01000000  sell

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM