简体   繁体   中英

Trying to load a json file into a flattened pandas DataFrame

I'm trying to load a json file from nist.gov into a pandas DataFrame without nested dicts so that I end up with flattened records in a pandas DataFrame. I can live with nested lists as I will stack and merge later. The intent is to end up with a flat file of vulnerabilities by affected products.

import pandas as pd

pd.set_option('display.max_colwidth', 80)  # set pandas column width to facilitate viewing
df = pd.read_json('https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip', compression='zip')  # load json file from nist

The values in df include a nested dict.

df.head(2)

  CVE_data_type CVE_data_format  CVE_data_version  CVE_data_numberOfCVEs CVE_data_timestamp                                                                        CVE_Items
0           CVE           MITRE                 4                    640  2018-06-05T18:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', ...
1           CVE           MITRE                 4                    640  2018-06-05T18:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', ...

When I expand df.CVE_Items into a CVE_Items DataFrame, I get more nested dicts.

CVE_items = df.CVE_Items.apply(pd.Series)
CVE_items.head(2)
                                                                               cve                                                                   configurations                                                                           impact      publishedDate   lastModifiedDate
0  {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', 'CVE_dat...  {'CVE_data_version': '4.0', 'nodes': [{'operator': 'OR', 'cpe': [{'vulnerabl...  {'baseMetricV2': {'cvssV2': {'version': '2.0', 'vectorString': '(AV:N/AC:M/A...  2011-12-27T11:55Z  2018-06-04T13:46Z
1  {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', 'CVE_dat...  {'CVE_data_version': '4.0', 'nodes': [{'operator': 'OR', 'cpe': [{'vulnerabl...  {'baseMetricV3': {'cvssV3': {'version': '3.0', 'vectorString': 'CVSS:3.0/AV:...  2018-04-24T20:29Z  2018-06-04T16:11Z

If I continue to expand the newly formed DataFrames, the plot thickens as I get more nested dicts and/or lists with nested dicts.

cve = CVE_items.cve.apply(pd.Series)
configurations = CVE_items.configurations.apply(pd.Series)
impact = CVE_items.impact.apply(pd.Series)

cve.head(2)
  data_type data_format data_version                                         CVE_data_meta                                                                          affects                                                                      problemtype                                                                       references                                                                      description
0       CVE       MITRE          4.0  {'ID': 'CVE-2011-3841', 'ASSIGNER': 'cve@mitre.org'}  {'vendor': {'vendor_data': [{'vendor_name': 'wpsymposiumpro', 'product': {'p...     {'problemtype_data': [{'description': [{'lang': 'en', 'value': 'CWE-79'}]}]}  {'reference_data': [{'url': 'http://secunia.com/advisories/47243', 'name': '...  {'description_data': [{'lang': 'en', 'value': 'Cross-site scripting (XSS) vu...
1       CVE       MITRE          4.0  {'ID': 'CVE-2013-3947', 'ASSIGNER': 'cve@mitre.org'}  {'vendor': {'vendor_data': [{'vendor_name': 'ahnlab', 'product': {'product_d...  {'problemtype_data': [{'description': [{'lang': 'en', 'value': 'CWE-119'}, {...  {'reference_data': [{'url': 'http://secunia.com/advisories/54465', 'name': '...  {'description_data': [{'lang': 'en', 'value': 'Buffer overflow in MedCoreD.s...

Any ideas on how I can flatten this file?

It turns out that pandas provides the functionality I need to expand embedded json objects.

import pandas as pd
from pandas.io.json import json_normalize

df = pd.read_json('https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-2019.json.zip', compression='zip')

The df dataframe includes an embedded json object in df['CVE_Items'].

df.head
  CVE_data_type CVE_data_format  CVE_data_version  CVE_data_numberOfCVEs CVE_data_timestamp                                          CVE_Items
0           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
1           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
2           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
3           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
4           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...

I used json_normalize to create a new dataframe from the expanded json object.

df_CVE_Items = json_normalize(df['CVE_Items'])

df_CVE_Items.head()
  configurations.CVE_data_version                               configurations.nodes cve.CVE_data_meta.ASSIGNER cve.CVE_data_meta.ID                     cve.affects.vendor.vendor_data cve.data_format cve.data_type cve.data_version                   cve.description.description_data                   cve.problemtype.problemtype_data                      cve.references.reference_data impact.baseMetricV2.acInsufInfo impact.baseMetricV2.cvssV2.accessComplexity impact.baseMetricV2.cvssV2.accessVector impact.baseMetricV2.cvssV2.authentication impact.baseMetricV2.cvssV2.availabilityImpact  impact.baseMetricV2.cvssV2.baseScore impact.baseMetricV2.cvssV2.confidentialityImpact impact.baseMetricV2.cvssV2.integrityImpact impact.baseMetricV2.cvssV2.vectorString impact.baseMetricV2.cvssV2.version  impact.baseMetricV2.exploitabilityScore  impact.baseMetricV2.impactScore impact.baseMetricV2.obtainAllPrivilege impact.baseMetricV2.obtainOtherPrivilege impact.baseMetricV2.obtainUserPrivilege impact.baseMetricV2.severity impact.baseMetricV2.userInteractionRequired impact.baseMetricV3.cvssV3.attackComplexity impact.baseMetricV3.cvssV3.attackVector impact.baseMetricV3.cvssV3.availabilityImpact  impact.baseMetricV3.cvssV3.baseScore impact.baseMetricV3.cvssV3.baseSeverity impact.baseMetricV3.cvssV3.confidentialityImpact impact.baseMetricV3.cvssV3.integrityImpact impact.baseMetricV3.cvssV3.privilegesRequired impact.baseMetricV3.cvssV3.scope impact.baseMetricV3.cvssV3.userInteraction       impact.baseMetricV3.cvssV3.vectorString impact.baseMetricV3.cvssV3.version  impact.baseMetricV3.exploitabilityScore  impact.baseMetricV3.impactScore   lastModifiedDate      publishedDate
0                             4.0  [{'operator': 'OR', 'cpe_match': [{'vulnerable...              cve@mitre.org        CVE-2019-0001  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'Receipt of a malform...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                      MEDIUM                                 NETWORK                                      NONE                                      COMPLETE                                   7.1                                             NONE                                       NONE              AV:N/AC:M/Au:N/C:N/I:N/A:C                                2.0                                      8.6                              6.9                                  False                                    False                                   False                         HIGH                                       False                                        HIGH                                 NETWORK                                          HIGH                                   5.9                                  MEDIUM                                             NONE                                       NONE                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H                                3.0                                      2.2                              3.6  2019-02-14T18:35Z  2019-01-15T21:29Z
1                             4.0  [{'operator': 'OR', 'cpe_match': [{'vulnerable...              cve@mitre.org        CVE-2019-0002  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'On EX2300 and EX3400...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                         LOW                                 NETWORK                                      NONE                                       PARTIAL                                   7.5                                          PARTIAL                                    PARTIAL              AV:N/AC:L/Au:N/C:P/I:P/A:P                                2.0                                     10.0                              6.4                                  False                                    False                                   False                         HIGH                                       False                                         LOW                                 NETWORK                                          HIGH                                   9.8                                CRITICAL                                             HIGH                                       HIGH                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H                                3.0                                      3.9                              5.9  2019-02-08T18:50Z  2019-01-15T21:29Z
2                             4.0  [{'operator': 'AND', 'children': [{'operator':...              cve@mitre.org        CVE-2019-0003  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'When a specific BGP ...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                      MEDIUM                                 NETWORK                                      NONE                                       PARTIAL                                   4.3                                             NONE                                       NONE              AV:N/AC:M/Au:N/C:N/I:N/A:P                                2.0                                      8.6                              2.9                                  False                                    False                                   False                       MEDIUM                                       False                                        HIGH                                 NETWORK                                          HIGH                                   5.9                                  MEDIUM                                             NONE                                       NONE                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H                                3.0                                      2.2                              3.6  2019-02-07T15:52Z  2019-01-15T21:29Z
3                             4.0  [{'operator': 'AND', 'children': [{'operator':...              cve@mitre.org        CVE-2019-0004                                                 []           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'On Juniper ATP, the ...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'https://kb.juniper.net/JSA10918', 'n...                           False                                         LOW                                   LOCAL                                      NONE                                          NONE                                   2.1                                          PARTIAL                                       NONE              AV:L/AC:L/Au:N/C:P/I:N/A:N                                2.0                                      3.9                              2.9                                  False                                    False                                   False                          LOW                                       False                                         LOW                                   LOCAL                                          NONE                                   5.5                                  MEDIUM                                             HIGH                                       NONE                                           LOW                        UNCHANGED                                       NONE  CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N                                3.0                                      1.8                              3.6  2019-01-29T16:40Z  2019-01-15T21:29Z
4                             4.0  [{'operator': 'AND', 'children': [{'operator':...              cve@mitre.org        CVE-2019-0005  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'On EX2300, EX3400, E...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                         LOW                                 NETWORK                                      NONE                                          NONE                                   5.0                                             NONE                                    PARTIAL              AV:N/AC:L/Au:N/C:N/I:P/A:N                                2.0                                     10.0                              2.9                                  False                                    False                                   False                       MEDIUM                                       False                                         LOW                                 NETWORK                                          NONE                                   5.3                                  MEDIUM                                             NONE                                        LOW                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N                                3.0                                      3.9                              1.4  2019-02-14T18:40Z  2019-01-15T21:29Z

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM