简体   繁体   中英

Error in python parse function

I am really new to Python and have some error when I run my code.

I have this Amazon data set which is formatted as a JSON file (Please see below for the json format).

{
  "reviewerID": "A2SUAM1J3GNN3B",
  "asin": "0000013714",
  "reviewerName": "J. McDonald",
  "helpful": [2, 3],
  "reviewText": "I bought this for my husband who plays the piano.  He is 
having a wonderful time playing these old hymns.  The music  is at times 
hard to read because we think the book was published for singing from more 
than playing from.  Great purchase though!",
  "overall": 5.0,
  "summary": "Heavenly Highway Hymns",
  "unixReviewTime": 1252800000,
  "reviewTime": "09 13, 2009"
}

The command I am using is offered by the data senders, which converts the JSON file above into 'strict json' file (the original JSON file is not strict json based on the data senders).

The command offered by them is as follows:

import json
import gzip

def parse(path):
  g = gzip.open(path, 'r')
  for l in g:
    yield json.dumps(eval(l))

f = open("output.strict", 'w')
for l in parse("reviews_Video_Games.json.gz"):
  f.write(l + '\n')

I have only changed the path, putting the directory of the the JSON file with quotation marks (eg, "C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz")

For example, the code that I ran looks like this:

import json
import gzip

def parse(C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz):
  g = gzip.open(C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz, 'r')
  for l in g:
    yield json.dumps(eval(l))

f = open("output.strict", 'w')
for l in parse("reviews_Video_Games.json.gz"):
  f.write(l + '\n')

However, I get the following error:

C:\Users\daisy\AppData\Local\Programs\Python\Python36-32>python C:\Users\daisy\AppData\Local\Programs\Python\strict_json.py
  File "C:\Users\daisy\AppData\Local\Programs\Python\strict_json.py", line 4
def parse("C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"):
                                                                                ^
SyntaxError: invalid syntax

Do you have any idea what is wrong with the syntax?

Again the original code is given by the data sender so I am quite sure that the code is correct. I think I did something wrong when I changed the 'path' into my file directory.

Thank you.

You can't define a function like that.

def parse(file_path):
  g = gzip.open(file_path, 'r')
  for l in g:
    yield json.dumps(eval(l))

parse(r"C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz")

Though you could set a default value like so:

def parse(file_path=r"C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"):
  g = gzip.open(file_path, 'r')
  for l in g:
    yield json.dumps(eval(l))

parse()

Update for encoding issue

>>> "C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
>>> "C:\\Users\\daisy\\Research\\study\\Amazon\\reviews_Video_Games.json.gz"
'C:\\Users\\daisy\\Research\\study\\Amazon\\reviews_Video_Games.json.gz'
>>> r"C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"
'C:\\Users\\daisy\\Research\\study\\Amazon\\reviews_Video_Games.json.gz'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM