简体   繁体   English

JSON 文件如何删除不需要的字符

[英]JSON file how to remove unwanted characters

so I scraped some data into a JSON file format but there are some unwanted characters in the saved data for example:所以我将一些数据刮成 JSON 文件格式,但保存的数据中有一些不需要的字符,例如:

"quote_text": "\“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\”", "author": "Albert Einstein", "tags": [ "change", "deep-thoughts", "thinking", "world" "quote_text": "\“我们创造的世界是我们思考的过程。如果不改变我们的想法,它就无法改变。\”", "author": "Albert Einstein", "tags": [ "change" ,“深思”,“思考”,“世界”

So how can I remove these \“ type characters from the file in python那么如何从 python 中的文件中删除这些\“类型的字符

Replace method:更换方法:

If you have only 1 or 2 characters to remove I suggest that you use the string .replace() method:如果您只有 1 或 2 个字符要删除,我建议您使用字符串.replace()方法:

An example can be on the quote_text key一个例子可以在 quote_text 键上

your_dict['quote_text'].replace('\u201c','')

Regex:正则表达式:

If you are struggling with multiple characters I suggest you dive into Regex如果您正在为多个字符而苦苦挣扎,我建议您深入研究 Regex

More:更多的:

If you wish to apply your function to the entire dictionnary values you can use dict comprehensions:如果您希望将您的函数应用于整个字典值,您可以使用 dict comprehensions:

d2 = dict((k, f(v)) for k, v in d1.items())

d1 being your original dictionnary and f your function. d1是您的原始字典,而f您的功能。

In our example it would be:在我们的示例中,它将是:

d2 = dict((k, v.replace('\u201c','')) for k, v in d1.items())

If you want to remove multiple characters you can use a list to indicate what letters to remove:如果要删除多个字符,可以使用列表来指示要删除的字母:

text = '{ "work": "\u201cfun\u201c", "foo": ["bar", "baz"] }'
remove_chars = ['u201c', 'b', 'f']
new_text = ''.join([ch for ch in text if ch not in remove_chars])

To replace unwanted characters make a dictionary to hold the substitutions then make the changes:要替换不需要的字符,请制作一个字典来保存替换内容,然后进行更改:

subs = {
  '\u201c': "'",
  'z': 't'
}
text = '{ "work": "\u201cfun\u201c", "foo": ["bar", "baz"] }'
letter_list = [(subs[ch] if ch in subs else ch)  for ch in text]
new_text = ''.join(letter_list)

Let's assume dictionary as d.让我们假设字典为 d。 As I can see, there are different unicode characters like \“ , \” .正如我所看到的,有不同的 unicode 字符,如\“\” If you want to remove all Unicode characters at once, you can do something like this:如果要一次删除所有 Unicode 字符,可以执行以下操作:

one liner code:一个班轮代码:

d['quote_text'].encode("ascii", "ignore").decode('utf-8')

Explanation in detail:详细说明:

The below one line code remove all the unicode characters and will return value in bytes.下面一行代码删除所有 unicode 字符,并将以字节为单位返回值。

remov_unicode_char = d['quote_text'].encode("ascii", "ignore")

Now, in order to convert into string, you can decode it.现在,为了转换为字符串,您可以对其进行解码。

convert_str =  remov_unicode_char.decode("utf-8")

Now, you can check the result by printing it.现在,您可以通过打印来检查结果。

print(convert_str)

output:输出:

The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM