簡體   English   中英

從 json 數據庫中的屬性恢復所有行

[英]recover all line from an attribute in a database in json

為了簡化我的問題,我在 json 中有一個基礎,並且我恢復了所有 json 行以將信息放在一個基礎中。 暫時看起來很容易,但問題是我的json沒有正確編寫

所以我做了一個代碼來恢復我所有的 json 行,但它並不適用於所有行,比如“傳記”。

我展示給你

{"name": "Nazamiu0304 Rau0304majiu0304", "personal_name": "Nazamiu0304 Rau0304majiu0304", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:00:41.270799"}, "key": "/authors/OL1001461A", "type": {"key": "/type/author"}, "revision": 2}
{"name": "Nazamiu0304 Rau0304majiu0304", "personal_name": "Nazamiu0304 Rau0304majiu0304", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:00:41.270799"}, "key": "/authors/OL1001461A", "type": {"key": "/type/author"}, "revision": 2}

你看,你有名字,個人名字......

有時你有其他信息,

{"bio": {"type": "/type/text", "value": "> "Eversley, William Pinder, B.C.L. Queen's Coll., Oxon, M.A., a member of the South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench Walk, Temple, E.C." rn> ...[in Foster's _Men at the Bar_][1]rnrnrn  rnrn[1]: https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar""}, "name": "William Pinder Eversley", "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "death_date": "1918", "photos": [6897255, 6897254], "last_modified": {"type": "/type/datetime", "value": "2018-07-31T15:39:07.982159"}, "latest_revision": 6, "key": "/authors/OL1003081A", "birth_date": "1851", "personal_name": "William Pinder Eversley", "type": {"key": "/type/author"}, "revision": 6}


{"name": "Valerie Meyer", "personal_name": "Valerie Meyer", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:22:33.63997"}, "key": "/authors/OL1004062A", "type": {"key": "/type/author"}, "revision": 2}

你可以看到我對元素“bio”有很多問題:因為他根本沒有正確地寫,所以配額沒有被正確解釋,“<”也是如此,所以我得到了這段代碼來改變 bio 的結構來利用它.

這是我更改生物結構的代碼

import re
import json
import pprint


bio_regex = re.compile(
    r"""
("bio":\s*{)   # bio field start
(.*?)          # content
(},)           # bio field end
(?=\s*(?:"\w+"|}))  # followed by another one or the json end
""",
    flags=re.VERBOSE | re.DOTALL)

value_regex = re.compile(
    r"""
("value":\s*")   # value field start
(.*?)            # content
("\s*\Z)         # value field end + end of string
""",
    flags=re.VERBOSE | re.DOTALL)


def normalize_value(mo):
    start, content, end = mo.group(1, 2, 3)
    content = content.replace('"', '\\"')
    return start + content + end


def normalize_bio(mo):
    start, content, end = mo.group(1, 2, 3)
    content = value_regex.sub(normalize_value, content)
    return start + content + end

messy_json = """
{ 
  "bio":{ 
    "type":"/type/text",
    "value":"> "Eversley, William Pinder, B.C.L. Queen's Coll., Oxon, M.A., a member of the South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench Walk, Temple, E.C." rn> ...[in Foster's Men at the Bar][1]rnrnrn rnrn[1]: https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar""
  },
  "name":"William Pinder Eversley",
  "created":{ 
    "type":"/type/datetime",
    "value":"2008-04-01T03:28:50.625462"
  },
  "death_date":"1918",
  "photos":[ 
    6897255,
    6897254
  ],
  "last_modified":{ 
    "type":"/type/datetime",
    "value":"2018-07-31T15:39:07.982159"
  },
  "latest_revision":6,
  "key":"/authors/OL1003081A",
  "birth_date":"1851",
  "personal_name":"William Pinder Eversley",
  "type":{ 
    "key":"/type/author"
  },
  "revision":6
}"""


result = bio_regex.sub(normalize_bio, messy_json)
obj = json.loads(result)

結果如下:


{'bio': {'type': '/type/text',
         'value': '> "Eversley, William Pinder, B.C.L. Queen\'s Coll., Oxon, M.A., a member of the '
                  "South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of "
                  'the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest '
                  "son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench "
                  'Walk, Temple, E.C." rn> ...[in Foster\'s Men at the Bar][1]rnrnrn rnrn[1]: '
                  'https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar"'},
 'birth_date': '1851',
 'created': {'type': '/type/datetime', 'value': '2008-04-01T03:28:50.625462'},
 'death_date': '1918',
 'key': '/authors/OL1003081A',
 'last_modified': {'type': '/type/datetime', 'value': '2018-07-31T15:39:07.982159'},
 'latest_revision': 6,
 'name': 'William Pinder Eversley',
 'personal_name': 'William Pinder Eversley',
 'photos': [6897255, 6897254],
 'revision': 6,
 'type': {'key': '/type/author'}}

這里的問題是,如果我把整行代碼都放在我的代碼中,這個腳本是好的,但是我想用良好的結構恢復我的 1000000 行生物,我不能做到每 1 條,我嘗試了很多有一個循環來恢復 1 per 1 的東西,但它總是讓我出錯,我需要知道如何恢復它的女巫一個循環。 我需要升級我的代碼以從 bio 行中獲取所有數據庫行,而不僅僅是 1 per 1

在此先感謝並感謝您聽我說!

例如,我想說:我有一個文件 openlibraryjson.json :

用這些行:

{"name": "Ismail Ibrahim Dr.", "title": "Dr.", "personal_name": "Ismail Ibrahim", "last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "key": "/authors/OL100304A", "type": {"key": "/type/author"}, "revision": 1} {"bio": {"type": "/type/text", "value": "> "Eversley, William Pinder, BCL Queen's Coll., Oxon, MA, a member of the South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench Walk, Temple, EC" rn> ...[in Foster's Men at the Bar ][1]rnrnrn rnrn[1]: https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar""}, "name": "William Pinder Eversley", "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "death_date": "1918", "photos": [6897255, 6897254], "last_modified": {"type": "/type/datetime", "value": "2018-07-31T15:39:07.982159"}, "latest_revision": 6, "key": "/authors/OL1003081A", "birth_date": "1851", "personal_name": "William Pinder Eversley", "type": {"key": "/type/author"}, "revision": 6} {"name": "Valerie Meyer", "personal_name": "Valerie Meyer", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:22:33.63997"}, "key": "/authors/OL1004062A", "type": {"key": "/type/author"}, "revision": 2} {"bio": {"type": "/type/text", "value": "[Deutsch] Deutscher Orientalist und Theologe.rn[English] German orientalist and biblical scholar."}, "name": "August Dillmann", "links": [{"url": " http://de.wikipedia.org/wiki/August_Dillmann ", "type": {"key": "/type/link"}, "title": "Wikipedia (Deutsch)"}, {"url": " http://en.wikipedia.org/wiki/August_Dillmann ", "type": {"key": "/type/link"}, "title": "Wikipedia (English)"}], "personal_name": "August Dillmann", "death_date": "4 July 1894.", "alternate_names": ["Christian Friedrich August Dillmann", "Ch. FA Dillmann", "Friedrich August Dillmann", "FA Dillmann", "Augustus Dillmann", "August Dillmann", "A. Dillmann"], "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "photos": [6676274], "last_modified": {"type": "/type/datetime", "value": "2017-03-31T12:45:57.925108"}, "latest_revision": 8, "key": "/authors/OL1179559A", "birth_date": "25 April 1823", "revision": 8, "type": {"key": "/type/author"}, "remote_ids": {"viaf": "45046685", "wikidata": "Q75216"}} {"last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "type": {"key": "/type/author"}, "name": "Physikertagung (1966 Munich, Germany)", "key": "/authors/OL1179696A", "revision": 1}

我只想使用 bio 行並將它們放在函數中,因為我試圖打開我的文件,並且我對待 name,personal_name ... 使用循環,它有效但不適用於 bio,因為它寫得不正確,所以我暫時跳過腳本中的 bio 但現在我不想跳過 bio 並以與 name,personal_name 相同的方式使用 bio ......

喜歡它 :

with open('openlibrary(3).json') as file: for i in range(101): line = file.readline() if "bio" in line: line.replace("\\'", "'") continue content_json = json.loads(line) if not "personal_name" in line: #print('NULL') ligne.append("NULL") continue try: #print(content_json['name']) ligne.append(content_json['personal_name']) except IndexError: print('NULL') if not "personal_name" in line: # print('NULL') personal_nom.append("NULL") continue try: # print(content_json['name']) personal_nom.append(content_json['personal_name']) except IndexError: print('NULL')

我只是在這里放了一些代碼來展示我為 name,personal_name 所做的事情...

再次感謝您聆聽並回答我!!!!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM