We have the following dataframe:
import pandas as pd
import numpy as np
import json
from json import JSONDecodeError
json_as_str_list = [
"[{'key1': 312, 'name': 'Simple name'}]",
"[{'key1': 981, 'name': 'Name n' quote'}]",
np.nan
]
d = {'json_as_str': json_as_str_list}
df = pd.DataFrame(data=d)
json_as_str
0 [{'key1': 312, 'name': 'Simple name'}]
1 [{'key1': 981, 'name': 'Name n' quote'}]
2 NaN
After the import json_as_str
column is a list of strings but I want it to be a list of JSON objects. I've written a function which should return a list of empty JSON objects given a string or an empty list given a np.nan
:
def convert_to_JSON_helper(json_str):
if isinstance(json_str, str):
json_str = json_str.replace("'", '"')
try:
return json.loads(json_str)
except JSONDecodeError:
print(json_str)
return []
else:
return []
Current implementation doesn't handle in-string single quotes (as in the second row of the dataframe). How should I modify the function so that it works as expected?
The current output I get while using df['json_as_str'].apply(convert_to_JSON_helper)
:
0 [{'key1': 312, 'name': 'Simple name'}]
1 []
2 []
Name: json_as_str, dtype: object
The output I'd like to get:
0 [{'key1': 312, 'name': 'Simple name'}]
1 [{'key1': 981, 'name': 'Name n' quote'}]
2 []
Name: json_as_str, dtype: object
The problem is not the function but the string. You typed a \\
to quote the single quote, but it was plain useless because a single \\ in a string quotes the following character (here the quote) and let it go in the string. Demo:
>>> a = " a 'b' 'c\'d' "
>>> a
" a 'b' 'c'd' "
The back slash has just be eaten in the string.
Anyway you should not try to convert quotes in a general way. Because of all the possible corner cases, you will have to build a dedicated (and complex) parser. So my advice is that you should just insert a correct JSON string in your dataframe.
Here is the convert a string (with single ') to dict.
import ast
data = ast.literal_eval("{'a' : 12, 'c' : 'd'}")
print(data)
print(type(data))
output
{'a': 12, 'c': 'd'}
<type 'dict'>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.