简体   繁体   中英

create pandas dataframe from uneven data

I am working with a data set that has yearly data as well as lifelong data in python 2.7. I have a dictionary that stores lifelong data, as well as an inner DataFrame of yearly data. So it looks something like

Bear1
{'color':'brown',
'grown_size':'7ft',
'stats': df1}
}

where the dataframe 'df1' is built like the following:

meals      children    territory
4          5              8
2          4              6
5          2              7

I would like to get a dataframe that is rectangular, each row having a different years data as well as all of the lifelong stats, so this would become something like:

color     grow_size    meals      children    territory
brown       7ft        4          5           8
brown       7ft        2          4           6
brown       7ft        5          2           7

I assume that this would need something like the Series.repeat() method in pandas, although this has yet to work for me. What would be the fastest way of accomplishing this, as there are many such bears with varying ages!

EDIT Unfortunately I found a problem with my question. The yearly data is already inside of a dataframe, not inside of a dictionary!

I have tried the following code for this:

 pd.DataFrame.from_dict(bears['bear1'])

with 'bears['bear1']' being the dictionary posted above, but I am receiving the following message:

  File "<stdin>", line 1, in <module>
  File "/Users/masongardner/Library/Python/2.7/lib/python/site-        packages/pandas/core/frame.py", line 226, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/Users/masongardner/Library/Python/2.7/lib/python/site-packages/pandas/core/frame.py", line 363, in _init_dict
    dtype=dtype)
  File "/Users/masongardner/Library/Python/2.7/lib/python/site-packages/pandas/core/frame.py", line 5158, in _arrays_to_mgr
    index = extract_index(arrays)
  File "/Users/masongardner/Library/Python/2.7/lib/python/site-packages/pandas/core/frame.py", line 5197, in extract_index    

ValueError: If using all scalar values, you must pass an index

Thanks!

Use from_dict :

In [20]:
d={'color':'brown',
'grown_size':'7ft',
'stats': {2007:[1,5,7,2],
        2008:[5,3,4,5],
        2009:[5,2,6,7]}
}
pd.DataFrame.from_dict(d)

Out[20]:
      color grown_size         stats
2007  brown        7ft  [1, 5, 7, 2]
2008  brown        7ft  [5, 3, 4, 5]
2009  brown        7ft  [5, 2, 6, 7]

also pd.DataFrame(d) will also work

EDIT

Here is a simple way to have what you want for one bear.

# recreating your data
d = {'meals':[4,2,5], 'children':[5,4,2], 'territory':[8,6,7]}  
bear1 = {'color':'brown',
        'grown_size':'7ft',
        'stats': DataFrame(d)}


def bear_to_df(bear_dict):
    df = bear_dict['stats']
    for (k,v) in bear_dict.iteritems():
        if k == 'stats':
            pass
        else:
            df[k] = v
    return df

In [32]: bear_to_df(bear1)
Out[32]: 
   children  meals  territory  color grown_size
0         5      4          8  brown        7ft
1         4      2          6  brown        7ft
2         2      5          7  brown        7ft

How many bears do you have ? If you want to concatenate all your bears'data in the same DataFrame use pandas.concat

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM