I'm trying to load a json file from nist.gov into a pandas DataFrame without nested dicts so that I end up with flattened records in a pandas DataFrame. I can live with nested lists as I will stack and merge later. The intent is to end up with a flat file of vulnerabilities by affected products.
import pandas as pd
pd.set_option('display.max_colwidth', 80) # set pandas column width to facilitate viewing
df = pd.read_json('https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip', compression='zip') # load json file from nist
The values in df include a nested dict.
df.head(2)
CVE_data_type CVE_data_format CVE_data_version CVE_data_numberOfCVEs CVE_data_timestamp CVE_Items
0 CVE MITRE 4 640 2018-06-05T18:00Z {'cve': {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', ...
1 CVE MITRE 4 640 2018-06-05T18:00Z {'cve': {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', ...
When I expand df.CVE_Items into a CVE_Items DataFrame, I get more nested dicts.
CVE_items = df.CVE_Items.apply(pd.Series)
CVE_items.head(2)
cve configurations impact publishedDate lastModifiedDate
0 {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', 'CVE_dat... {'CVE_data_version': '4.0', 'nodes': [{'operator': 'OR', 'cpe': [{'vulnerabl... {'baseMetricV2': {'cvssV2': {'version': '2.0', 'vectorString': '(AV:N/AC:M/A... 2011-12-27T11:55Z 2018-06-04T13:46Z
1 {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', 'CVE_dat... {'CVE_data_version': '4.0', 'nodes': [{'operator': 'OR', 'cpe': [{'vulnerabl... {'baseMetricV3': {'cvssV3': {'version': '3.0', 'vectorString': 'CVSS:3.0/AV:... 2018-04-24T20:29Z 2018-06-04T16:11Z
If I continue to expand the newly formed DataFrames, the plot thickens as I get more nested dicts and/or lists with nested dicts.
cve = CVE_items.cve.apply(pd.Series)
configurations = CVE_items.configurations.apply(pd.Series)
impact = CVE_items.impact.apply(pd.Series)
cve.head(2)
data_type data_format data_version CVE_data_meta affects problemtype references description
0 CVE MITRE 4.0 {'ID': 'CVE-2011-3841', 'ASSIGNER': 'cve@mitre.org'} {'vendor': {'vendor_data': [{'vendor_name': 'wpsymposiumpro', 'product': {'p... {'problemtype_data': [{'description': [{'lang': 'en', 'value': 'CWE-79'}]}]} {'reference_data': [{'url': 'http://secunia.com/advisories/47243', 'name': '... {'description_data': [{'lang': 'en', 'value': 'Cross-site scripting (XSS) vu...
1 CVE MITRE 4.0 {'ID': 'CVE-2013-3947', 'ASSIGNER': 'cve@mitre.org'} {'vendor': {'vendor_data': [{'vendor_name': 'ahnlab', 'product': {'product_d... {'problemtype_data': [{'description': [{'lang': 'en', 'value': 'CWE-119'}, {... {'reference_data': [{'url': 'http://secunia.com/advisories/54465', 'name': '... {'description_data': [{'lang': 'en', 'value': 'Buffer overflow in MedCoreD.s...
Any ideas on how I can flatten this file?
It turns out that pandas provides the functionality I need to expand embedded json objects.
import pandas as pd
from pandas.io.json import json_normalize
df = pd.read_json('https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-2019.json.zip', compression='zip')
The df dataframe includes an embedded json object in df['CVE_Items'].
df.head
CVE_data_type CVE_data_format CVE_data_version CVE_data_numberOfCVEs CVE_data_timestamp CVE_Items
0 CVE MITRE 4 2510 2019-04-23T07:00Z {'cve': {'data_type': 'CVE', 'data_format': 'M...
1 CVE MITRE 4 2510 2019-04-23T07:00Z {'cve': {'data_type': 'CVE', 'data_format': 'M...
2 CVE MITRE 4 2510 2019-04-23T07:00Z {'cve': {'data_type': 'CVE', 'data_format': 'M...
3 CVE MITRE 4 2510 2019-04-23T07:00Z {'cve': {'data_type': 'CVE', 'data_format': 'M...
4 CVE MITRE 4 2510 2019-04-23T07:00Z {'cve': {'data_type': 'CVE', 'data_format': 'M...
I used json_normalize to create a new dataframe from the expanded json object.
df_CVE_Items = json_normalize(df['CVE_Items'])
df_CVE_Items.head()
configurations.CVE_data_version configurations.nodes cve.CVE_data_meta.ASSIGNER cve.CVE_data_meta.ID cve.affects.vendor.vendor_data cve.data_format cve.data_type cve.data_version cve.description.description_data cve.problemtype.problemtype_data cve.references.reference_data impact.baseMetricV2.acInsufInfo impact.baseMetricV2.cvssV2.accessComplexity impact.baseMetricV2.cvssV2.accessVector impact.baseMetricV2.cvssV2.authentication impact.baseMetricV2.cvssV2.availabilityImpact impact.baseMetricV2.cvssV2.baseScore impact.baseMetricV2.cvssV2.confidentialityImpact impact.baseMetricV2.cvssV2.integrityImpact impact.baseMetricV2.cvssV2.vectorString impact.baseMetricV2.cvssV2.version impact.baseMetricV2.exploitabilityScore impact.baseMetricV2.impactScore impact.baseMetricV2.obtainAllPrivilege impact.baseMetricV2.obtainOtherPrivilege impact.baseMetricV2.obtainUserPrivilege impact.baseMetricV2.severity impact.baseMetricV2.userInteractionRequired impact.baseMetricV3.cvssV3.attackComplexity impact.baseMetricV3.cvssV3.attackVector impact.baseMetricV3.cvssV3.availabilityImpact impact.baseMetricV3.cvssV3.baseScore impact.baseMetricV3.cvssV3.baseSeverity impact.baseMetricV3.cvssV3.confidentialityImpact impact.baseMetricV3.cvssV3.integrityImpact impact.baseMetricV3.cvssV3.privilegesRequired impact.baseMetricV3.cvssV3.scope impact.baseMetricV3.cvssV3.userInteraction impact.baseMetricV3.cvssV3.vectorString impact.baseMetricV3.cvssV3.version impact.baseMetricV3.exploitabilityScore impact.baseMetricV3.impactScore lastModifiedDate publishedDate
0 4.0 [{'operator': 'OR', 'cpe_match': [{'vulnerable... cve@mitre.org CVE-2019-0001 [{'vendor_name': 'juniper', 'product': {'produ... MITRE CVE 4.0 [{'lang': 'en', 'value': 'Receipt of a malform... [{'description': [{'lang': 'en', 'value': 'CWE... [{'url': 'http://www.securityfocus.com/bid/106... False MEDIUM NETWORK NONE COMPLETE 7.1 NONE NONE AV:N/AC:M/Au:N/C:N/I:N/A:C 2.0 8.6 6.9 False False False HIGH False HIGH NETWORK HIGH 5.9 MEDIUM NONE NONE NONE UNCHANGED NONE CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H 3.0 2.2 3.6 2019-02-14T18:35Z 2019-01-15T21:29Z
1 4.0 [{'operator': 'OR', 'cpe_match': [{'vulnerable... cve@mitre.org CVE-2019-0002 [{'vendor_name': 'juniper', 'product': {'produ... MITRE CVE 4.0 [{'lang': 'en', 'value': 'On EX2300 and EX3400... [{'description': [{'lang': 'en', 'value': 'CWE... [{'url': 'http://www.securityfocus.com/bid/106... False LOW NETWORK NONE PARTIAL 7.5 PARTIAL PARTIAL AV:N/AC:L/Au:N/C:P/I:P/A:P 2.0 10.0 6.4 False False False HIGH False LOW NETWORK HIGH 9.8 CRITICAL HIGH HIGH NONE UNCHANGED NONE CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H 3.0 3.9 5.9 2019-02-08T18:50Z 2019-01-15T21:29Z
2 4.0 [{'operator': 'AND', 'children': [{'operator':... cve@mitre.org CVE-2019-0003 [{'vendor_name': 'juniper', 'product': {'produ... MITRE CVE 4.0 [{'lang': 'en', 'value': 'When a specific BGP ... [{'description': [{'lang': 'en', 'value': 'CWE... [{'url': 'http://www.securityfocus.com/bid/106... False MEDIUM NETWORK NONE PARTIAL 4.3 NONE NONE AV:N/AC:M/Au:N/C:N/I:N/A:P 2.0 8.6 2.9 False False False MEDIUM False HIGH NETWORK HIGH 5.9 MEDIUM NONE NONE NONE UNCHANGED NONE CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H 3.0 2.2 3.6 2019-02-07T15:52Z 2019-01-15T21:29Z
3 4.0 [{'operator': 'AND', 'children': [{'operator':... cve@mitre.org CVE-2019-0004 [] MITRE CVE 4.0 [{'lang': 'en', 'value': 'On Juniper ATP, the ... [{'description': [{'lang': 'en', 'value': 'CWE... [{'url': 'https://kb.juniper.net/JSA10918', 'n... False LOW LOCAL NONE NONE 2.1 PARTIAL NONE AV:L/AC:L/Au:N/C:P/I:N/A:N 2.0 3.9 2.9 False False False LOW False LOW LOCAL NONE 5.5 MEDIUM HIGH NONE LOW UNCHANGED NONE CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N 3.0 1.8 3.6 2019-01-29T16:40Z 2019-01-15T21:29Z
4 4.0 [{'operator': 'AND', 'children': [{'operator':... cve@mitre.org CVE-2019-0005 [{'vendor_name': 'juniper', 'product': {'produ... MITRE CVE 4.0 [{'lang': 'en', 'value': 'On EX2300, EX3400, E... [{'description': [{'lang': 'en', 'value': 'CWE... [{'url': 'http://www.securityfocus.com/bid/106... False LOW NETWORK NONE NONE 5.0 NONE PARTIAL AV:N/AC:L/Au:N/C:N/I:P/A:N 2.0 10.0 2.9 False False False MEDIUM False LOW NETWORK NONE 5.3 MEDIUM NONE LOW NONE UNCHANGED NONE CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N 3.0 3.9 1.4 2019-02-14T18:40Z 2019-01-15T21:29Z
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.