简体   繁体   中英

Pathlib read_text as a string literal

I am trying to generate some json data from txt files.

The txt files are generated from books, using their ocr, which makes them inestimable (i can't randomly change the chars i don't like, since they could be important) and unreliable (the ocr could have gone wrong, the author could have inserted symbols that would mess with my code).

As of now, i have this:

output_folder = Path(output_folder)
    
value = json.loads('{"nome": "' + file_name[:len(file_name)-4] + '", "testu": "' + (Path(filename).read_text()) + '"}')
    path = output_folder / (file_name[:len(file_name)-4] + "_opare.json")
    with path.open(mode="w+") as working_file:
        working_file.write("[" + str(value) + "]")
        working_file.close()

This throws me the error json.decoder.JSONDecodeError: Invalid control character which i understood is caused by my book starting (yes) with a ' (a quote).

I've read about string literals, that seem to be relevant for my case, but i didn't uderstood how i could use them.

What can i do?

Thanks

Why would you make a json just to parse it again? You can just create a dictionary:

value = {
  "nome": file_name[:len(file_name)-4],
  "testu":Path(filename).read_text(),
}

Reading between the lines, the JSONDecodeError doesn't actually come from this code, does it? It comes from the code that's reading your file later.

You can't write a dict to a JSON file using str(value) . Python's dict-to-string conversion uses single quotes, which is not legal in JSON. You need to convert it back to JSON:

    with path.open(mode="w+") as working_file:
        json.dump( [value], working_file )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM