After review of similar questions on SO , I have been unable to find a solution to DataFrame formatting with a nested dictionary to a desired outcome.
Being new to Pandas and moderately new to Python, I have spent the better part of two days trying and failing at various potential solutions ( json_normalize , dictionary flattening , pd.concat , etc..).
I have a method which creates a DataFrame from a API call:
def make_dataframes(self):
# removed non-related code
self._data_frame_counts = pd.DataFrame({
'Created': (self._data_frame_30days.count()['Created']),
'Closed': (self._data_frame_30days.count()['Closed']),
'Owner':
(self._data_frame_30days['Owner'].value_counts().to_dict()),
'Resolution':
(self._data_frame_30days['Resolution'].value_counts().to_dict()),
'Severity':
(self._data_frame_30days['Severity'].value_counts().to_dict())
})
that writes a nested dictionary from Pandas value_count/s:
{'Created': 35,
'Closed': 6,
'Owner': {'aName': 30, 'first.last': 3, 'last.first': 2},
'Resolution': {'TruePositive': 5, 'FalsePositive': 1},
'Severity': {2: 31, 3: 4}}
Which after execution looks like:
Created Closed Owner Resolution Severity
aName 35 6 30.0 NaN NaN
first.last 35 6 3.0 NaN NaN
last.first 35 6 2.0 NaN NaN
TruePositive 35 6 NaN 5.0 NaN
FalsePositive 35 6 NaN 1.0 NaN
2 35 6 NaN NaN 31.0
3 35 6 NaN NaN 4.0
I want it to look like the following. Where data is accurately aligned with axis and accounts for missing data-points not present in the dictionary but could be there in future runs.
Created Closed Owner Resolution Severity
total 35 6 NaN NaN NaN
aName NaN NaN 30 NaN NaN
first.last NaN NaN 3 NaN NaN
last.first NaN NaN 2 NaN NaN
anotherName NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN 0
2 NaN NaN NaN NaN 31
3 NaN NaN NaN NaN 4
second.Name NaN NaN NaN NaN NaN
third.name NaN NaN NaN NaN NaN
TruePositive NaN NaN NaN 5 NaN
FalsePositive NaN NaN NaN 1 NaN
Assuming I have a dictionary d
d = {
'Created': 35,
'Closed': 6,
'Owner': {'aName': 30, 'first.last': 3, 'last.first': 2},
'Resolution': {'TruePositive': 5, 'FalsePositive': 1},
'Severity': {2: 31, 3: 4}
}
I'd create some additional keys
_d = {
'Created': {'total': d['Created']},
'Closed': {'total': d['Closed']},
'Severity': {k: d['Severity'].get(k, 0) for k in range(1, 4)}
}
pd.DataFrame({**d, **_d})
Created Closed Owner Resolution Severity
total 35.0 6.0 NaN NaN NaN
aName NaN NaN 30.0 NaN NaN
first.last NaN NaN 3.0 NaN NaN
last.first NaN NaN 2.0 NaN NaN
TruePositive NaN NaN NaN 5.0 NaN
FalsePositive NaN NaN NaN 1.0 NaN
1 NaN NaN NaN NaN 0.0
2 NaN NaN NaN NaN 31.0
3 NaN NaN NaN NaN 4.0
This is my way of updating a few of your keys and we can see what I did:
print(_d)
{'Created': {'total': 35}, 'Closed': {'total': 6}, 'Severity': {0: 0, 2: 31, 3: 4}}
By default, the pandas.DataFrame
constructor can take a dictionary and use the keys as column names. What it does with the values depends on the values.
35
for all rows in the 'Created'
column. The last item is what motivated my answer. I changed the scalar value of 35
to a dictionary where I specified the index value {'total': 35}
I'd recommend changing the original method to something like this:
def make_dataframes(self):
# removed non-related code
counts = self._data_frame_30days['Severity'].value_counts().to_dict()
self._data_frame_counts = pd.DataFrame({
'Created': {'total': self._data_frame_30days.count()['Created']},
'Closed': {'total': self._data_frame_30days.count()['Closed']},
'Owner':
(self._data_frame_30days['Owner'].value_counts().to_dict()),
'Resolution':
(self._data_frame_30days['Resolution'].value_counts().to_dict()),
'Severity': {k: counts.get(k, 0) for k in sorted({k, *counts})}
})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.