I think melt (as discussed here ) may potentially be useful for this, but I can't quite figure out how to use it to solve my problem.
I'm starting with a complex dictionary like this:
order = [
{
"order_id" : 0,
"lines" : [
{
"line_id" : 1,
"line_amount" : 3.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 6.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 5.43,
"line_description" : "third line"
},
]
},
{
"order_id" : 1,
"lines" : [
...
}
]
I want a DataFrame with one row per order line (not one row per order) that still includes the original order's attributes (which in this example is just the order_id) . Currently the most efficient way to achieve this I've come up with is:
# Orders DataFrame
odf = pandas.DataFrame(orders)
line_dfs = []
for oid, line_list in odf.iterrows():
line_df = pandas.DataFrame(line_list).copy()
line_df["order_id"] = oid
line_dfs += [ line_df ]
# Line DataFrame
ldf = pandas.concat(line_dfs, sort=False, ignore_index=True).copy()
Is there a more efficient, "vectorized" way to .apply something to achieve this?
ldf = odf.lines.apply(...?...)
Thanks for any help, including just a link to an answer on SO or elsewhere that already addresses this and that I just haven't found yet.
Did you try read_json ?
df = pd.read_json(orders)
Use list comprehension with pop
for extract lines
by key and merge dicts for list of dictionaries and pass to DataFrame
constructor:
orders = [
{
"order_id" : 0,
"lines" : [
{
"line_id" : 1,
"line_amount" : 3.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 6.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 5.43,
"line_description" : "third line"
},
]
},
{
"order_id" : 1,
"lines" : [
{
"line_id" : 1,
"line_amount" : 30.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 60.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 50.43,
"line_description" : "third line"
},
]
}
]
L = [{**x, **y} for x in orders for y in x.pop('lines')]
odf = pd.DataFrame(L)
print (odf)
line_amount line_description line_id order_id
0 3.45 first line 1 0
1 6.66 second line 2 0
2 5.43 third line 3 0
3 30.45 first line 1 1
4 60.66 second line 2 1
5 50.43 third line 3 1
Another solution with loops:
L = []
for x in orders:
for y in x.pop('lines'):
L.append({**x, **y})
odf = pd.DataFrame(L)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.