簡體   English   中英

如何從巨大的json文件中解析並獲取具體數據實現python中的搜索

[英]How to parse and get specific data from a huge json file to implement search in python

我有一個包含大量信息的 json 文件,所以我試圖在有 position 的地方提取特定數據,我需要獲取直接名稱數據,還嘗試在 python 中實現搜索。我正在上傳樣本的一部分json 文件中的數據 ex.json

`

{
  "storables": [
    {
      "columns": [
        {
          "position": 0,
          "header": {
            "id": "",
            "indexVersion": 35643,
            "generationNum": 35643,
            "name": "CAT",
            "author": "",
            "created": 1620247188226,
            "modified": 1668544812673,
            "modifiedBy": "",
            "owner": "",
            "isDeleted": false,
            "isHidden": false,
            "tags": [],
            "isExternal": false,
            "isDeprecated": false
          },
          "complete": true,
          "incompleteDetail": [],
          "isDerived": true,
          "dataType": "VARCHAR",
          "type": "ATTRIBUTE",
          "sageOutputColumnId": "",
          "defaultAggrType": "NONE",
          "ownerName": "",
          "ownerType": "WORKSHEET",
          "entityCategory": "DEFAULT",
          "spotiqPreference": "DEFAULT",
          "isAdditive": false,
          "indexType": "DEFAULT",
          "indexPriority": 1,
          "sources": [
            {
              "tableId": "",
              "tableName": "",
              "columnId": "",
              "columnName": "CATASTROPHE"
            }
          ],
          "synonyms": [],
          "injectedInlineValues": [],
          "precision": -1,
          "scale": 0,
          "isPrimaryKey": false,
          "isAttributionDimension": true,
          "derivationExpr": {
            "exprType": "LOGICAL_COLUMN_REFERENCE",
            "logicalColumn": {
              "header": {
                "id": "",
                "indexVersion": 35499,
                "generationNum": 35499,
                "name": "CATASTROPHE",
                "author": "",
                "created": 1630716505804,
                "modified": 1668211006637,
                "modifiedBy": "",
                "owner": "",
                "isDeleted": false,
                "isHidden": false,
                "schemaStripe": "",
                "databaseStripe": "",
                "tags": [],
                "isExternal": false,
                "isDeprecated": false
              }
            },
            "joinPaths": [
              {
                "joins": [
                  {
                    "sourceTable": "",
                    "destinationTable": "",
                    "content": {
                      "relationships": [
                        {
                          "sourceColumn": "",
                          "destinationColumn": ""
                        }
                      ],
                      "weight": 1
                    },
                    "joinType": "INNER",
                    "type": "USER_DEFINED",
                    "isOneToOneJoin": false,
                    "header": {
                      "id": "",
                      "indexVersion": 35499,
                      "generationNum": 35499,
                      "name": "",
                      "description": "",
                      "author": "",
                      "created": 1650658367043,
                      "modified": 1668211006686,
                      "modifiedBy": "",
                      "owner": "",
                      "isDeleted": false,
                      "isHidden": false,
                      "tags": [],
                      "type": "USER_DEFINED",
                      "isExternal": false,
                      "isDeprecated": false
                    },
                    "complete": true,
                    "incompleteDetail": [],
                    "sourceColumns": [
                      ""
                    ],
                    "targetColumns": [
                      ""
                    ]
                  }
                ]
              }
            ]
          }
        },
        {
          "position": 1,
          "header": {
            "id": "",
            "indexVersion": 35643,
            "generationNum": 35643,
            "name": "Peril",
            "author": "",
            "created": 1620247188226,
            "modified": 1668544812673,
            "modifiedBy": "",
            "owner": "",
            "isDeleted": false,
            "isHidden": false,
            "tags": [],
            "isExternal": false,
            "isDeprecated": false
          },
          "complete": true,
          "incompleteDetail": [],
          "isDerived": true,
          "dataType": "VARCHAR",
          "type": "ATTRIBUTE",
          "sageOutputColumnId": "",
          "defaultAggrType": "NONE",
          "ownerName": "",
          "ownerType": "WORKSHEET",
          "entityCategory": "DEFAULT",
          "spotiqPreference": "DEFAULT",
          "isAdditive": false,
          "indexType": "DEFAULT",
          "indexPriority": 1,
          "sources": [
            {
              "tableId": "",
              "tableName": "",
              "columnId": "",
              "columnName": "TYPE_OF"
            }
          ],
          "synonyms": [],
          "injectedInlineValues": [],
          "precision": -1,
          "scale": 0,
          "isPrimaryKey": false,
          "isAttributionDimension": true,
          "derivationExpr": {
            "exprType": "LOGICAL_COLUMN_REFERENCE",
            "logicalColumn": {
              "header": {
                "id": "",
                "indexVersion": 35499,
                "generationNum": 35499,
                "name": "TYPE_OF",
                "author": "",
                "created": 1630716505804,
                "modified": 1668211006637,
                "modifiedBy": "",
                "owner": "",
                "isDeleted": false,
                "isHidden": false,
                "schemaStripe": "",
                "databaseStripe": "",
                "tags": [],
                "isExternal": false,
                "isDeprecated": false
              }
            },
            "joinPaths": [
              {
                "joins": [
                  {
                    "sourceTable": "",
                    "destinationTable": "",
                    "content": {
                      "relationships": [
                        {
                          "sourceColumn": "",
                          "destinationColumn": ""
                        }
                      ],
                      "weight": 1
                    },
                    "joinType": "INNER",
                    "type": "USER_DEFINED",
                    "isOneToOneJoin": false,
                    "header": {
                      "id": "",
                      "indexVersion": 35499,
                      "generationNum": 35499,
                      "name": "",
                      "description": "Copy of user table relationship",
                      "author": "",
                      "created": 1650658367043,
                      "modified": 1668211006686,
                      "modifiedBy": "",
                      "owner": "",
                      "isDeleted": false,
                      "isHidden": false,
                      "tags": [],
                      "type": "USER_DEFINED",
                      "isExternal": false,
                      "isDeprecated": false
                    },
                    "complete": true,
                    "incompleteDetail": [],
                    "sourceColumns": [
                      ""
                    ],
                    "targetColumns": [
                      ""
                    ]
                  }
                ]
              }
            ]
          }
        },
        {
          "position": 2,
          "header": {
            "id": "",
            "indexVersion": 35643,
            "generationNum": 35643,
            "name": "Job",
            "author": "",
            "created": 1620247188226,
            "modified": 1668544812673,
            "modifiedBy": "",
            "owner": "",
            "isDeleted": false,
            "isHidden": false,
            "tags": [],
            "isExternal": false,
            "isDeprecated": false
          },
          "complete": true,
          "incompleteDetail": [],
          "isDerived": true,
          "dataType": "VARCHAR",
          "type": "ATTRIBUTE",
          "sageOutputColumnId": "",
          "defaultAggrType": "NONE",
          "ownerName": "",
          "ownerType": "WORKSHEET",
          "entityCategory": "DEFAULT",
          "spotiqPreference": "DEFAULT",
          "isAdditive": false,
          "indexType": "DEFAULT",
          "indexPriority": 1,
          "sources": [
            {
              "tableId": "",
              "tableName": "",
              "columnId": "",
              "columnName": ""
            }
          ],
          "synonyms": [],
          "injectedInlineValues": [],
          "precision": -1,
          "scale": 0,
          "isPrimaryKey": false,
          "isAttributionDimension": true,
          "derivationExpr": {
            "exprType": "LOGICAL_COLUMN_REFERENCE",
            "logicalColumn": {
              "header": {
                "id": "",
                "indexVersion": 35499,
                "generationNum": 35499,
                "name": "ROTATION_TRADE",
                "author": "",
                "created": 1630716505804,
                "modified": 1668211006637,
                "modifiedBy": "",
                "owner": "",
                "isDeleted": false,
                "isHidden": false,
                "schemaStripe": "",
                "databaseStripe": "",
                "tags": [],
                "isExternal": false,
                "isDeprecated": false
              }
            },
            "joinPaths": [
              {
                "joins": [
                  {
                    "sourceTable": "",
                    "destinationTable": "",
                    "content": {
                      "relationships": [
                        {
                          "sourceColumn": "",
                          "destinationColumn": ""
                        }
                      ],
                      "weight": 1
                    },
                    "joinType": "INNER",
                    "type": "USER_DEFINED",
                    "isOneToOneJoin": false,
                    "header": {
                      "id": "",
                      "indexVersion": 35499,
                      "generationNum": 35499,
                      "name": "",
                      "description": "Copy of user table relationship",
                      "author": "",
                      "created": 1650658367043,
                      "modified": 1668211006686,
                      "modifiedBy": "",
                      "owner": "",
                      "isDeleted": false,
                      "isHidden": false,
                      "tags": [],
                      "type": "USER_DEFINED",
                      "isExternal": false,
                      "isDeprecated": false
                    },
                    "complete": true,
                    "incompleteDetail": [],
                    "sourceColumns": [
                      ""
                    ],
                    "targetColumns": [
                      ""
                    ]
                  }
                ]
              }
            ]
          }
        },
        {
          "position": 3,
          "header": {
            "id": "",
            "indexVersion": 35643,
            "generationNum": 35643,
            "name": "Job Lenghth",
            "author": "",
            "created": 1620247188226,
            "modified": 1668544812673,
            "modifiedBy": "",
            "owner": "",
            "isDeleted": false,
            "isHidden": false,
            "tags": [],
            "isExternal": false,
            "isDeprecated": false
          },
          "complete": true,
          "incompleteDetail": [],
          "isDerived": true,
          "dataType": "VARCHAR",
          "type": "ATTRIBUTE",

`

`

with open('ex.json', 'r') as f:
    for line in f:
        if 'position' in line:
            for line in f: 
                if ' name: ' in line:
                    print(line)

` 我嘗試了這段 python 代碼,但它不起作用。 我不確定如何只返回 position 之后的直接名稱。文件中有多個名稱實例,但我只需要 position 之后的那個...

import json
with open('ex.json', 'r') as f:
    data = json.load(f)

現在您可以訪問所有 json 項目,就像您從數據變量訪問 python 中的任何字典/對象一樣

您的代碼可能有效,但您需要稍微更改一下邏輯。 這是解決方案的快速草圖:

prevWasPosition = False
with open('ex.json', 'r') as f:
    for line in f:
        if '"position":' in line:
            prevWasPosition = True
            continue

        if prevWasPosition and '"name":' in line:
            print(line)
        prevWasPosition = False

請注意,此解決方案基於 json 文件格式正確的假設。 如果不是,您可能會得到意想不到的結果。 更強的解決方案是逐塊使用讀取文件並將其解析為 json,但它超出了此答案的 scope。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM