简体   繁体   English

在 Python 中使用包含 Unicode 的抓取 JSON 数据

[英]Using scraped JSON data containing Unicode in Python

I scraped some JSON data into a file called 'wotd-page-one.json' using Scrapy.我使用 Scrapy 将一些 JSON 数据抓取到一个名为“wotd-page-one.json”的文件中。 The JSON data contains some Spanish words and the accented letters were converted to Unicode. JSON 数据包含一些西班牙语单词,重音字母被转换为 Unicode。 I'd like to load this data and make it usbale with a python script within the same directory.我想加载这些数据并在同一目录中使用 python 脚本使其变得可用。 I am trying to load this data into a list to work each JSON key and value individually.我正在尝试将此数据加载到列表中以单独处理每个 JSON 键和值。 However, I am having a hard time making this happen since I have not had a ton of experience using Unicode and JSON.但是,我很难做到这一点,因为我没有大量使用 Unicode 和 JSON 的经验。 Could anyone please help me find a way to make these data accessible via a Python list.任何人都可以帮我找到一种方法来通过 Python 列表访问这些数据。 Ideally, Id like to make it something like data[2] == "DEF" data[3] == "string with any unicode characters converted to latin-1" and data[4] == "SENTENCE" data[5] == "string with any unicode characters converted to latin-1"理想情况下,我想使它像 data[2] == "DEF" data[3] == "string with any unicode characters convert to latin-1" and data[4] == "SENTENCE" data[5] ==“将任何 unicode 字符转换为 latin-1 的字符串”

   Python file:

   data=[]
   with open('wotd-page-one.json', encoding='utf-8') as f:
   for line in f:
       line = line.replace('\n', '')
       data.append(line)
   print(data)


    JSON file:
 [
{"TRANSLATION": "I don't like how that guy's whistling; it gives me the creeps.", "WORD": "silbar", "DEF": "to whistle", "SENTENCE": "No me gusta c\u00f3mo silba ese se\u00f1or; me da escalofr\u00edos."},
{"TRANSLATION": "\"Is somebody there?\" asked the boy in a startled voice.", "WORD": "sobresaltado", "DEF": "startled", "SENTENCE": "\"\u00bfHay alguien aqu\u00ed?\" pregunt\u00f3 el ni\u00f1o con voz sobresaltada."},
{"TRANSLATION": "Carla made a face at me when I asked her if she was scared.", "WORD": "la mueca", "DEF": "face", "SENTENCE": "Carla me hizo una mueca cuando le pregunt\u00e9 si ten\u00eda miedo."},
{"TRANSLATION": "The teacher tapped the board with the chalk.", "WORD": "golpetear", "DEF": "to tap", "SENTENCE": "El maestro golpete\u00f3 el pizarr\u00f3n con la tiza."}
   ]

Output:
 ['[', 
'{"TRANSLATION": "I don\'t like how that guy\'s whistling; it gives me the creeps.", "WORD": "silbar", "DEF": "to whistle", "SENTENCE": "No me gusta c\\u00f3mo silba ese se\\u00f1or; me da escalofr\\u00edos."},', '
{"TRANSLATION": "\\"Is somebody there?\\" asked the boy in a startled voice.", "WORD": "sobresaltado", "DEF": "startled", "SENTENCE": "\\"\\u00bfHay alguien aqu\\u00ed?\\" pregunt\\u00f3 el ni\\u00f1o con voz sobresaltada."},', '
{"TRANSLATION": "Carla made a face at me when I asked her if she was scared.", "WORD": "la mueca", "DEF": "face", "SENTENCE": "Carla me hizo una mueca cuando le pregunt\\u00e9 si ten\\u00eda miedo."},', '
{"TRANSLATION": "The teacher tapped the board with the chalk.", "WORD": "golpetear", "DEF": "to tap", "SENTENCE": "El maestro golpete\\u00f3 el pizarr\\u00f3n con la tiza."}', ']']

With a JSON file, you can load it in one operation.使用 JSON 文件,您可以通过一次操作加载它。 It will be turned into a Python structure...in this case, a list of dictionaries.它将变成一个 Python 结构......在这种情况下,是一个字典列表。 For example:例如:

import json

with open('wotd-page-one.json') as f:
    data = json.load(f)

for d in data:
    print(d['SENTENCE'])

Output:输出:

No me gusta cómo silba ese señor; me da escalofríos.
"¿Hay alguien aquí?" preguntó el niño con voz sobresaltada.
Carla me hizo una mueca cuando le pregunté si tenía miedo.
El maestro golpeteó el pizarrón con la tiza.

The first line of the json file is read "[" , then it is you an attempt is made to parse it however an exception is raised because this is not valid json format. json 文件的第一行读取为"[" ,然后您尝试解析它,但是由于这不是有效的 json 格式而引发异常。 By reading line by line, you're disregarding the rest of the file, so you shouldn't do this.通过逐行阅读,您忽略了文件的其余部分,因此您不应该这样做。 Instead just use json.load like so:而是像这样使用json.load

with open("wotd-page-one.json") as f:
    data = json.load(f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM