I have a dictionary
data = { 'x' : 1,
'y' : [1,2,3],
'z' : (4,5,6),
'w' : {1:2, 3:4}
}
I'd like to construct a Pandas DataFrame such that the list and tuple do not get broadcasted:
df = pd.DataFrame(some_transformation(data), index=['a'])
to get
df =
x y z w
a 1 (1,2,3) (4,5,6) (1,2,3,4)
Or some sort of flattening and/or string-fy of the list/tuple/dict. What is the easiest / most efficient way of doing so, without having to go down the exact data structure of each dictionary entry?
without going down the exact data structure, I think the easiest way to achieve what you want is:
data={k:str(v) for k,v in data}
Above statement will make all values as string type. Now you can convert the data dictionary to a dataframe by using below line:
df=pd.DataFrame(data, index=[0])
This will get you the output in below form:
w x y z
0 {1: 2, 3: 4} 1 [1, 2, 3] (4, 5, 6)
Now for your desired output: (you can use other efficent methods as well for string replacement in dataframe)
for acol in df.columns:
a[acol]=a[acol].values[0].strip('[{()}]')
a[acol]=a[acol].values[0].replace(':', ',')
Output looks like
w x y z
1, 2, 3, 4 1 1, 2, 3 4, 5, 6
You cannot apply one transformation to lists/tuples and dictionaries. They have very different properties. You can flatten all dictionaries and then create a pd.Series
out of the updated dictionary.
for key in data:
if isinstance(data[key],dict):
data[key] = list(data[key].keys())+list(data[key].values())
pd.Series(data)
#w [1, 3, 2, 4]
#x 1
#y [1, 2, 3]
#z (4, 5, 6)
#dtype: object
Convert it further into a DataFrame, if you want:
df = pd.DataFrame(pd.Series(data)).T
# w x y z
#0 [1, 3, 2, 4] 1 [1, 2, 3] (4, 5, 6)
You can handle lists in the same spirit (convert them to tuples).
This is one way.
def transformer(data):
for k, v in data.items():
if isinstance(v, list):
data[k] = [tuple(v)]
elif isinstance(v, dict):
data[k] = [tuple(chain(*(v.items())))]
else:
data[k] = [v]
return data
df = pd.DataFrame(transformer(data), index=['a'])
# w x y z
# a (1, 2, 3, 4) 1 (1, 2, 3) (4, 5, 6)
You can use set_value to assign those elements to the df and then transform dict and list to tuples.
df=pd.DataFrame(columns=data.keys())
[df.set_value(0,k,v) for k,v in data.items()]
df = df.applymap(lambda x: sum([[k,v] for k,v in x.items()],[]) if isinstance(x,dict) else x)
df = df.applymap(lambda x: tuple(x) if isinstance(x,list) else x)
Out[716]:
x y z w
0 1 (1, 2, 3) (4, 5, 6) (1, 2, 3, 4)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.