[英]A dictionary has a separate dictionary and i want to convert it in dataframe in python such that the table contains columns which has sub columns
Data=[{'endDate': {'raw': 1585612800, 'fmt': '2020-03-31'},
'totalRevenue': {'raw': 67985000, 'fmt': '67.98M', 'longFmt':
'67,985,000'},
'costOfRevenue': {'raw': 0, 'fmt': None, 'longFmt': '0'},
'grossProfit': {'raw': 67985000, 'fmt': '67.98M', 'longFmt':
'67,985,000'},
'sellingGeneralAdministrative': {'raw': 37779000,
'fmt': '37.78M'}},
{'endDate': {'raw': 1577750400, 'fmt': '2019-12-31'},
'totalRevenue': {'raw': 79115000, 'fmt': '79.11M', 'longFmt':
'79,115,000'},
'costOfRevenue': {'raw': 0, 'fmt': None, 'longFmt': '0'},
'grossProfit': {'raw': 79115000, 'fmt': '79.11M', 'longFmt':
'79,115,000'},
' sellingGeneralAdministrative': {'raw': 36792000,
'fmt': '36.79M',
'longFmt': '36,792,000'}}]
i want Data in this format
Data =[{endDate:{'fmt':'2020-03-31'},
totalRevenue:{'fmt':67.98M},
costofRevenue:{'fmt':None}' and so on
ie removing 'raw' and 'longfmt' and after that i want it to convert the list of dict to dataframe.即删除'raw'和'longfmt',然后我希望它将dict列表转换为dataframe。
Here is what you can do to convert multiple dictionaries like that into a dataframe:以下是将多个这样的字典转换为 dataframe 的方法:
import pandas as pd
a = {...}
b = {...}
c = [a, b]
f = {'grossProfit':[], 'incomeBeforeTax':[], 'incomeTaxExpense':[]}
for e in c:
for k in f.keys():
f[d].append(e[d])
print(pd.DataFrame(f))
pandas
doesn't actually support "sub-columns", as it seems you're requesting. pandas
实际上并不支持“子列”,正如您所要求的那样。 It does, though, support flattening json
objects in a way that {'a': {'b': 'value'}}
gives you column ab = 'value'
.但是,它确实支持以{'a': {'b': 'value'}}
为您提供列ab = 'value'
的方式展平json
对象。 The official method for performing this is json_normalize
, and would be used like such执行此操作的官方方法是json_normalize
,并且会像这样使用
import pandas as pd
income_statement_history = {
"totalRevenue": {
"raw": 67985000,
"fmt": "67.98M",
"longFmt": "67,985,000"
},
"costOfRevenue": {
"raw": 0,
"fmt": 'null',
"longFmt": "0"
},
"grossProfit": {
"raw": 67985000,
"fmt": "67.98M",
"longFmt": "67,985,000"
},
"totalOperatingExpenses": {
"raw": 46790000,
"fmt": "46.79M",
"longFmt": "46,790,000"
},
"operatingIncome": {
"raw": 21195000,
"fmt": "21.2M",
"longFmt": "21,195,000"
}
}
df = pd.json_normalize(income_statement_history)
And printing df
would give you打印df
会给你
>>> df
totalRevenue.raw totalRevenue.fmt totalRevenue.longFmt costOfRevenue.raw costOfRevenue.fmt ... totalOperatingExpenses.fmt totalOperatingExpenses.longFmt operatingIncome.raw operatingIncome.fmt operatingIncome.longFmt
0 67985000 67.98M 67,985,000 0 null ... 46.79M 46,790,000 21195000 21.2M 21,195,000
[1 rows x 15 columns]
You could proceed to dynamically access those column values with您可以继续动态访问这些列值
>>> col = 'totalOperatingExpenses'
>>> subcol = 'longFmt'
>>> df[f'{col}.{subcol}']
0 46,790,000
Name: totalOperatingExpenses.longFmt, dtype: object
Deciding between this, a pd.DataFrame
initialization as @Ann Zen's answer suggests, or whatever method you've been using, depends on your exact need .在这之间做出决定,如@Ann Zen 的回答所建议的pd.DataFrame
初始化,或者您一直使用的任何方法,取决于您的确切需要。
Is your goal a visually pleasing disposition of columns based on json data?您的目标是基于 json 数据的视觉上令人愉悦的列配置吗? Is your goal a clear way of accessing a sub-column given its name and the name of the base column?给定子列的名称和基列的名称,您的目标是访问子列的清晰方法吗? Most answers I can think of are based on preference only, and the differences are minimal.我能想到的大多数答案仅基于偏好,差异很小。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.