简体   繁体   中英

parsing json file with hebrew and english with python 3.5

my json file, well, part of it look like :

[
  {
    "id": 472,
    "name": "אבו גוש",
    "engName": "ABU GHOSH"
  },
  {
    "id": 473,
    "name": "אבו סנאן",
"engName": "ABU SINAN"
  },
  {
     "id": 1342,
    "name": "אבו קורינאת (יישוב)",
    "engName": "ABU QUREINAT"
  },
]

etc..

and my part of code look like :

with open('israelCities.json') as data_file:
    jsonData = json.loads(data_file.read().encode('utf8'))
    print(jsonData)

it failed on second line (jsonData = ....), i'm new to python and didn't see any similar question about it, any help will be appreciated

Thanks !!

EDIT

those two worked perfect for me :

 import json
 import urllib.request
 url='https://raw.githubusercontent.com/royts/israel-cities/master/israel-cities.json'
 data = urllib.request.urlopen(url).read().decode('utf-8')
 json.loads(data)

And This One :

import json
import requests

r = requests.get('https://raw.githubusercontent.com/royts/israel-cities/master/israel-cities.json')
with open('israelCities.json', 'w') as f:
    json.dump(r.json(), f)


with open('israelCities.json') as f:
json_data = json.load(f)

Thank you !!

This, from your code: json.loads(data_file.read().encode('utf8')) tries to read data from file and then convert it to utf8.

Try this instead: json.loads(data_file.read(), encoding='utf8') , which means: read this, which is written as utf8.

Of course, the file should be saved as utf-8 or it won't work.


EDIT:

With simplified usage, as @mhawke suggested, and using OP's original file, this works:

>>> httpresponse = urllib.urlopen('https://raw.githubusercontent.com/royts/israel-cities/master/israel-cities.json')
>>> json.load(httpresponse)

EDIT 2:

If you are using Python 3 , try this instead:

>>> import json
>>> import urllib.request
>>> url='https://raw.githubusercontent.com/royts/israel-cities/master/israel-cities.json'
>>> data = urllib.request.urlopen(url).read().decode('utf-8')
>>> json.loads(data)

You only need to tell the loads what the encoding is, not try to convert it to an encoding.

so,

import json

with open('israelCities.json') as data_file:
    jsonData = json.loads(data_file.read(), encoding='utf-8')
    print(jsonData)

will yield

[{u'engName': u'ABU GHOSH', u'id': 472, u'name': u'\א\ב\ו \ג\ו\ש'}, {u'engName': u'ABU SINAN', u'id': 473, u'name': u'\א\ב\ו \ס\נ\א\ן'}, {u'engName': u'ABU QUREINAT', u'id': 1342, u'name': u'\א\ב\ו \ק\ו\ר\י\נ\א\ת (\י\י\ש\ו\ב)'}]

but only if you have saved israelCities.json as 'utf-8' in its encoding first!

You don't need to call read() on the file. Use json.load() instead:

import json

with open('israelCities.json') as data_file:
    jsonData = json.load(data_file)

If the file is UTF8 encoded (and the one in the git repo israel-cities is) you don't need to specify the encoding to json.load() .


Update

From comments in other answers it seems that you might be downloading the file from github and saving it. If you make a clone of the repo you should have no problem with the file - it is already UTF8 encoded. If you are unsure you can download the file using the requests library and explicitly save it as json:

import json
import requests

r = requests.get('https://raw.githubusercontent.com/royts/israel-cities/master/israel-cities.json')
with open('israelCities.json', 'w') as f:
    json.dump(r.json(), f)

Now you should definitely have a file that can be loaded with:

with open('israelCities.json') as f:
    json_data = json.load(f)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM