简体   繁体   English

用 Python 解析复杂的 JSON 对象:搜索特定的键/值对

[英]Parsing a complex JSON object with Python: search a specific key/value pair

General question : how can I search a specific key:value pair in a JSON using Python?一般问题:如何使用 Python 在 JSON 中搜索特定的key:value对?

Details for the specific case : I'm reading ~ 45'000 JSON objects, each one of them look like this one .具体案例的详细信息:我正在阅读 ~ 45'000 个 JSON 对象,每个对象看起来都像这样
As you can see, inside every JSON there are several dictionaries that have the same keys (but different values): "facetName , "facetLabel" , "facetValues" .如您所见,在每个 JSON 中,都有几个具有相同键(但值不同)的字典: "facetName , "facetLabel" , "facetValues"
I'm interested in the dictionary that starts with "facetName": "soggettof" , that goes like:我对以"facetName": "soggettof"开头的字典感兴趣,它是这样的:

{
  "facetName": "soggettof",
  "facetLabel": "Soggetto",
  "facetValues": [
    [
      "chiesa - storia - documenti",
      "chiesa - storia - documenti",
      "1"
    ],
    [
      "espiazione - mare mediterraneo <bacino> - antichita - congressi - munster - 1999",
      "espiazione - mare mediterraneo <bacino> - antichita - congressi - munster - 1999",
      "1"
    ],
    [
      "lega rossa combattenti - storia",
      "lega rossa combattenti - storia",
      "1"
    ],
    [
      "pavia - storia ecclesiastica - origini-sec. 12.",
      "pavia - storia ecclesiastica - origini-sec. 12.",
      "1"
    ],
    [
      "pavia <diocesi> - storia - origini-sec. 12.",
      "pavia <diocesi> - storia - origini-sec. 12.",
      "1"
    ],
    [
      "persia - sviluppo economico - 1850-1900 - fonti diplomatiche inglesi",
      "persia - sviluppo economico - 1850-1900 - fonti diplomatiche inglesi",
      "1"
    ]

Please note, that not all the JSON objects have that.请注意,并非所有 JSON 对象都具有此功能。

How can I grab the values of the facetValues list, but only in the dictionary that I'm interested in?如何获取facetValues列表的值,但只能在我感兴趣的字典中获取?

I found your question a little confusing, partially because the data shown in it was not really the JSON-object you needed to extract the information from—but instead was just an example of a sub-JSON-object you wanted to extract it from.我发现您的问题有点令人困惑,部分原因是其中显示的数据实际上并不是您需要从中提取信息的 JSON 对象,而是您想要从中提取信息的子 JSON 对象的一个​​示例。 Fortunately you had a link to the outermost container JSON-object (even though the data in corresponding sub-JSON-object in it was different).幸运的是,您有一个指向最外层容器 JSON 对象的链接(即使其中相应子 JSON 对象中的数据不同)。 Here's the data from that link:这是来自该链接的数据:

json_obj = {"numFound":1,"start":0,"rows":3,"briefRecords":[{"progressivoId":0,"codiceIdentificativo":"IT\\ICCU\\LO1\\0120590","autorePrincipale":"Savoia, Carlo","titolo":"Per la inaugurazione dell'Asilo infantile Strozzi nei locali della caserma Filippini già convento della Vittoria / parole di mons. Carlo Savoia","pubblicazione":"Mantova : Tip. Eredi Segna, 1870","livello":"Monografia","tipo":"Testo a stampa","numeri":[],"note":[],"nomi":[],"luogoNormalizzato":[],"localizzazioni":[],"citazioni":[]}],"facetRecords":[{"facetName":"level","facetLabel":"Livello bibliografico","facetValues":[["Monografia","m","1"]]},{"facetName":"tiporec","facetLabel":"Tipo di documento","facetValues":[["Testo a stampa","a","1"]]},{"facetName":"nomef","facetLabel":"Autore","facetValues":[["savoia, carlo","savoia, carlo","1"]]},{"facetName":"soggettof","facetLabel":"Soggetto","facetValues":[["mantova - asili infantili","mantova - asili infantili","1"]]},{"facetName":"luogof","facetLabel":"Luogo di pubblicazione","facetValues":[["mantova","mantova","1"]]},{"facetName":"lingua","facetLabel":"Lingua","facetValues":[["italiano","ita","1"]]},{"facetName":"paese","facetLabel":"Paese","facetValues":[["italia","it","1"]]}]}

It's important to have this outermost container because it is through it you will have to drill-down to the portion you want.拥有这个最外面的容器很重要,因为通过它您将不得不深入到您想要的部分。 Once you have the actual data It's often helpful to reformat it to make its structure clear.获得实际数据后,重新格式化数据以使其结构清晰通常很有帮助。 You can do this by hand, or have the computer do it via a print(json.dumps(json_obj, indent=2)) , although the results from that can sometimes have a little too much white space in them (which can be counterproductive).您可以手动执行此操作,也可以让计算机通过print(json.dumps(json_obj, indent=2)) ,尽管这样做的结果有时会包含过多的空白(这可能会适得其反)。

That being the case here, below is the more succinct version I came up doing it manually that still let's me see the overall layout of the data:既然如此,下面是我手动提出的更简洁的版本,它仍然让我看到数据的整体布局:

json_obj = {"numFound" : 1,
             "start" : 0,
             "rows" : 3,
             "briefRecords" : [
                {"progressivoId" : 0,
                 "codiceIdentificativo" : "IT\\ICCU\\LO1\\0120590",
                 "autorePrincipale" : "Savoia, Carlo",
                 "titolo" : "Per la inaugurazione dell'Asilo infantile Strozzi nei locali della caserma Filippini già convento della Vittoria / parole di mons. Carlo Savoia",
                 "pubblicazione" : "Mantova : Tip. Eredi Segna, 1870",
                 "livello" : "Monografia",
                 "tipo" : "Testo a stampa",
                 "numeri" : [],
                 "note" : [],
                 "nomi" : [],
                 "luogoNormalizzato" : [],
                 "localizzazioni" : [],
                 "citazioni" : []
                }
             ],
             "facetRecords" : [
                {"facetName" : "level" ,
                 "facetLabel" : "Livello bibliografico" ,
                 "facetValues" : [["Monografia" , "m" , "1"]]},
                {"facetName" : "tiporec" ,
                 "facetLabel" : "Tipo di documento" ,
                 "facetValues" : [["Testo a stampa" , "a" , "1"]]},
                {"facetName" : "nomef" ,
                 "facetLabel" : "Autore" ,
                 "facetValues" : [["savoia, carlo" , "savoia, carlo" , "1"]]},
                {"facetName" : "soggettof" ,
                 "facetLabel" : "Soggetto" ,
                 "facetValues" : [["mantova - asili infantili" , "mantova - asili infantili" , "1"]]},
                {"facetName" : "luogof" ,
                 "facetLabel" : "Luogo di pubblicazione" ,
                 "facetValues" : [["mantova" , "mantova" , "1"]]},
                {"facetName" : "lingua" ,
                 "facetLabel" : "Lingua" ,
                 "facetValues" : [["italiano" , "ita" , "1"]]},
                {"facetName" : "paese" ,
                 "facetLabel" : "Paese" ,
                 "facetValues" : [["italia" , "it" , "1"]]}
             ]
            }

Once you have something like this, it's usually fairly easy to determine what code is needed.一旦你有了这样的东西,通常很容易确定需要什么代码。 In this case it's:在这种情况下,它是:

target_facet_name = "soggettof"

for record in json_obj["facetRecords"]:
    if record["facetName"] == target_facet_name:
        for value in record["facetValues"]:
            print(value)

Since facetRecords is a list , a linear search through them as shown is required to find the one(s) wanted.由于facetRecords是一个list ,因此需要如图所示对它们进行线性搜索以找到想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM