简体   繁体   中英

Python JSON change single quotes to double quotes leave in-string quotes alone

We have the following dataframe:

import pandas as pd
import numpy as np
import json
from json import JSONDecodeError

json_as_str_list = [
    "[{'key1': 312, 'name': 'Simple name'}]",
    "[{'key1': 981, 'name': 'Name n' quote'}]",
    np.nan
]
d = {'json_as_str': json_as_str_list}
df = pd.DataFrame(data=d)


    json_as_str
0   [{'key1': 312, 'name': 'Simple name'}]
1   [{'key1': 981, 'name': 'Name n' quote'}]
2   NaN

After the import json_as_str column is a list of strings but I want it to be a list of JSON objects. I've written a function which should return a list of empty JSON objects given a string or an empty list given a np.nan :

 def convert_to_JSON_helper(json_str):
    if isinstance(json_str, str):
        json_str = json_str.replace("'", '"')
        try:
            return json.loads(json_str)
        except JSONDecodeError:
            print(json_str)
            return []
    else:
        return []

Current implementation doesn't handle in-string single quotes (as in the second row of the dataframe). How should I modify the function so that it works as expected?

The current output I get while using df['json_as_str'].apply(convert_to_JSON_helper) :

0    [{'key1': 312, 'name': 'Simple name'}]
1                                        []
2                                        []
Name: json_as_str, dtype: object

The output I'd like to get:

0    [{'key1': 312, 'name': 'Simple name'}]
1  [{'key1': 981, 'name': 'Name n' quote'}]
2                                        []
Name: json_as_str, dtype: object

The problem is not the function but the string. You typed a \\ to quote the single quote, but it was plain useless because a single \\ in a string quotes the following character (here the quote) and let it go in the string. Demo:

>>> a = " a 'b' 'c\'d' "
>>> a
" a 'b' 'c'd' "

The back slash has just be eaten in the string.

Anyway you should not try to convert quotes in a general way. Because of all the possible corner cases, you will have to build a dedicated (and complex) parser. So my advice is that you should just insert a correct JSON string in your dataframe.

Here is the convert a string (with single ') to dict.

import ast

data = ast.literal_eval("{'a' : 12, 'c' : 'd'}")
print(data)
print(type(data))

output

{'a': 12, 'c': 'd'}
<type 'dict'>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM