简体   繁体   English

如何删除json数据python中的特殊字符

[英]How to remove special characters in json data python

I am reading a set of data from a json file.我正在从 json 文件中读取一组数据。 Content of the json file looks like: json 文件的内容如下所示:

"Address":"4820 ALCOA AVE�            ",
"City":"VERNON�   "

As you can see that it contains a special character and white space at the end.如您所见,它包含一个特殊字符 和末尾的空格。 While reading this json data, it is coming like:在读取这个 json 数据时,它是这样的:

'address': '4820 ALCOA AVE�             '
'city': 'VERNON�   '

I can remove the whitespace easily but I am not sure how can I remove the ¿½ .我可以轻松删除空格,但我不确定如何删除¿½ I do not have direct access to json file so cannot edit it and even if I had access to json file, I would talk couple of hours to edit the file.我无法直接访问 json 文件,因此无法编辑它,即使我可以访问 json 文件,我也会花几个小时来编辑该文件。 Is there any way in python we can remove this special characters.在python中有什么方法可以删除这个特殊字符。 Please help.请帮忙。 Thanks谢谢

Looks like somewhere upstream wasn't handling character encoding properly and ended up with replacement characters... You may need to keep an eye out in case it mangled more important parts of the text (eg. accented characters, non-English letters, emoji).看起来上游的某个地方没有正确处理字符编码并以替换字符结束......您可能需要注意以防它破坏了文本的更重要部分(例如重音字符,非英文字母,表情符号)。

For the immediate problem, you can load the JSON data with the utf-8 encoding, then strip the character '\�' .对于眼前的问题,您可以使用 utf-8 编码加载 JSON 数据,然后'\�'字符'\�'

   value = value.strip().strip('\ufffd')

If the replacement characters also appear in the middle (and you want to delete them), you might want to use replace() instead.如果替换字符也出现在中间(并且您想删除它们),您可能需要使用replace()代替。

    value = value.replace('\ufffd', '').strip()

you can use regexp你可以使用正则表达式

    import re
    address = re.sub(r"[^\x20-\x7E]", "", "4820 ALCOA AVE� ")
    print(address)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM