[英]Reading unstructured dictionaries in pandas dataframe
我正在嘗試從我從 json 文件中讀取的字典集合中創建一個 Pandas 數據框。 字典如下——
d1 = {"DisplayName": "Test_drive", "permissions": {"read": True, "read_acp": True, "write": True, "write_acp": True}}
d2= {"DisplayName": "Log delivery","URI": "http://test_drive.com/Logs", "permissions": {"read": False, "read_acp": True, "write": True, "write_acp": False}}
我正在嘗試將這些放入熊貓數據框中。 當我嘗試在如下所示的數據框中讀取它們時 -
df = pd.DataFrame(d) **or** df = pd.DataFrame.from_dict(d)
它產生這個 -
DisplayName permissions
read Test_drive True
read_acp Test_drive True
write Test_drive True
write_acp Test_drive True
或閱讀如下 -
df1 = pd.DataFrame(d).Transpose()
它產生這個 -
read read_acp write write_acp
DisplayName Test_drive Test_drive Test_drive Test_drive
permissions True True True True
我正在嘗試閱讀這些詞典並將它們加入一個數據框 -
**DisplayName** **read** **read_acp** **write** **write_acp** URI
Test_drive True True True True NA
Log delivery False True True False http://test_drive.com/Logs
有沒有pytonic方法來做到這一點?
通過附加創建數據框,然后使用樞軸將其重塑為您需要的結構
df = pd.DataFrame.from_dict(d1).append(pd.DataFrame.from_dict(d2))
df.reset_index().pivot(index='DisplayName', columns='index', values='permissions')
包含 URI
>>> df.reset_index().pivot(index='DisplayName', columns='index', values=['permissions', 'URI'])
permissions URI
index read read_acp write write_acp read read_acp write write_acp
DisplayName
Log delivery False True True False http://test_drive.com/Logs http://test_drive.com/Logs http://test_drive.com/Logs http://test_drive.com/Logs
Test_drive True True True True NaN NaN NaN NaN
import pandas as pd
# Input Data
d1 = {"DisplayName": "Test_drive", "permissions": {"read": True, "read_acp": True, "write": True, "write_acp": True}}
d2= {"DisplayName": "Log delivery","URI": "http://test_drive.com/Logs", "permissions": {"read": False, "read_acp": True, "write": True, "write_acp": False}}
# Convert to DataFrame
dicts = [d1, d2]
df_rows = [pd.DataFrame(d) for d in dicts]
df = pd.concat(df_rows, axis=0).reset_index(drop=False)
# Reshape As Desired
tp1 = df.pivot(index='DisplayName', columns='index', values='permissions')
answer = tp1.merge(df[['DisplayName', 'URI']].drop_duplicates(),
how='left',
left_index=True,
right_on='DisplayName').set_index('DisplayName')
輸出:
>>> answer
read read_acp write write_acp URI
DisplayName
Log delivery False True True False http://test_drive.com/Logs
Test_drive True True True True NaN
感謝Vishnudev和Max Power的幫助。 我認為以下答案為我提供了我試圖獲得的確切數據框。
d1 = {"DisplayName": "Test_drive", "permissions": {"read": True, "read_acp": True, "write": True, "write_acp": True}}
d2= {"DisplayName": "Log delivery","URI": "http://test_drive.com/Logs", "permissions": {"read": False, "read_acp": True, "write": True, "write_acp": False}}
df = pd.concat([pd.Series(d1),pd.Series(d2)], axis=1).transpose()
df = pd.concat([df.drop(['permissions'], axis=1),df['permissions'].apply(pd.Series)],axis=1)
**DisplayName URI read read_acp write write_acp**
0 Test_drive NaN True True True True
1 Log delivery http://test_drive.com/Logs False True True False
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.