简体   繁体   中英

Extracting information between two curly brackets in a file with Python

I have a file that was written in a JSON structure but is not correctly formatted. The content looks similar to this:

[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2:"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]

Unlike many questioned asked here before, the contents are all on the same line, so I was trying to read the code line by line but I select the whole thing if I use readline().

I am trying to extract only the information between the curly brackets { } with the brackets, and print them. I am able to open the file, but I am finding it difficult to find a way to read starting from the { and ending at } then continue to look for the next { and } and so on. I don't really care about the square brackets, just the curly brackets. Also, the values can differ in length so I can set a number of characters to be read after the bracket, as it is different for each set of brackets most of the time.

Any guidance would be greatly appreciated.

import re

fileContent = "[{'key0':'value0' , 'key1':'value1', 'key2':'value2'}, {'key0':'value3', 'key1':'value4', 'key2':'value5'}, {'key0':'value6', 'key1':'value7', 'key2':'value8'}]"

pattern_with_braces = r'\{.*?\}'
pattern_without_braces = r'(?<=\{).*?(?=\})'
parts = re.findall(pattern_without_braces, fileContent)

I suggest you use the regex module in order to modify the lines and then transform them into a dictionary:

import re
import json
with open("data.txt") as f:
  lines = f.readlines()
  for line in lines:
    modified = re.sub(r"({|\s)\"(\w+):", r'\1"\2":', line)
    dictionary = json.loads(modified)
    print(dictionary)

In your example, running the code above would result in something like:

[{'key0': 'value0', 'key1': 'value1', 'key2': 'value2'}, {'key0': 'value3', 'key1': 'value4', 'key2': 'value5'}, {'key0': 'value6', 'key1': 'value7', 'key2': 'value8'}]

Moreover, you will have access to the keys and values of this dictionary.

Note that the "data.txt" file in the code above is as follows:

[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2:"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]

Try using json.loads method from Python json encoder module that "Deserialize fp (a.read()-supporting text file or binary file containing a JSON document) to a Python object using a conversion table".

To decode your json string:

import json

str_to_load = '[{"key0":"value0" , "key1":"value1", "key2":"value2"}, {"key0":"value3", "key1":"value4", "key2":"value5"}, {"key0":"value6", "key1":"value7", "key2:"value8"}]'
str_to_load = json.loads(str_to_load)

print(str_to_load[2]['key2'])

output: value8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM