[英]JSON file how to remove unwanted characters
so I scraped some data into a JSON file format but there are some unwanted characters in the saved data for example:所以我将一些数据刮成 JSON 文件格式,但保存的数据中有一些不需要的字符,例如:
"quote_text": "\“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\”", "author": "Albert Einstein", "tags": [ "change", "deep-thoughts", "thinking", "world" "quote_text": "\“我们创造的世界是我们思考的过程。如果不改变我们的想法,它就无法改变。\”", "author": "Albert Einstein", "tags": [ "change" ,“深思”,“思考”,“世界”
So how can I remove these \“ type characters from the file in python那么如何从 python 中的文件中删除这些\“类型的字符
Replace method:更换方法:
If you have only 1 or 2 characters to remove I suggest that you use the string .replace()
method:如果您只有 1 或 2 个字符要删除,我建议您使用字符串
.replace()
方法:
An example can be on the quote_text key一个例子可以在 quote_text 键上
your_dict['quote_text'].replace('\u201c','')
Regex:正则表达式:
If you are struggling with multiple characters I suggest you dive into Regex如果您正在为多个字符而苦苦挣扎,我建议您深入研究 Regex
More:更多的:
If you wish to apply your function to the entire dictionnary values you can use dict comprehensions:如果您希望将您的函数应用于整个字典值,您可以使用 dict comprehensions:
d2 = dict((k, f(v)) for k, v in d1.items())
d1
being your original dictionnary and f
your function. d1
是您的原始字典,而f
您的功能。
In our example it would be:在我们的示例中,它将是:
d2 = dict((k, v.replace('\u201c','')) for k, v in d1.items())
If you want to remove multiple characters you can use a list to indicate what letters to remove:如果要删除多个字符,可以使用列表来指示要删除的字母:
text = '{ "work": "\u201cfun\u201c", "foo": ["bar", "baz"] }'
remove_chars = ['u201c', 'b', 'f']
new_text = ''.join([ch for ch in text if ch not in remove_chars])
To replace unwanted characters make a dictionary to hold the substitutions then make the changes:要替换不需要的字符,请制作一个字典来保存替换内容,然后进行更改:
subs = {
'\u201c': "'",
'z': 't'
}
text = '{ "work": "\u201cfun\u201c", "foo": ["bar", "baz"] }'
letter_list = [(subs[ch] if ch in subs else ch) for ch in text]
new_text = ''.join(letter_list)
Let's assume dictionary as d.让我们假设字典为 d。 As I can see, there are different unicode characters like \“ , \” .
正如我所看到的,有不同的 unicode 字符,如\“ 、 \” 。 If you want to remove all Unicode characters at once, you can do something like this:
如果要一次删除所有 Unicode 字符,可以执行以下操作:
one liner code:一个班轮代码:
d['quote_text'].encode("ascii", "ignore").decode('utf-8')
Explanation in detail:详细说明:
The below one line code remove all the unicode characters and will return value in bytes.下面一行代码删除所有 unicode 字符,并将以字节为单位返回值。
remov_unicode_char = d['quote_text'].encode("ascii", "ignore")
Now, in order to convert into string, you can decode it.现在,为了转换为字符串,您可以对其进行解码。
convert_str = remov_unicode_char.decode("utf-8")
Now, you can check the result by printing it.现在,您可以通过打印来检查结果。
print(convert_str)
output:输出:
The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.