简体   繁体   中英

What are proxies to efficiently estimate whether a file contains JSON?

Not sure if this is the kind of question that SO allows, but better sorry than safe:

I have a script that should check whether a file is JSON, and replace it with a JSON version if it is not. The script is supposed to be fast. If I go with the brute-force solution:

try:
    json.loads(file)
except ValueError:
    replace(file)

... it is an order of magnitude too slow. Now in my specific situation, it's okay to have some false positives, so I decided to go with an imperfect solution for the sake of speed. For example:

if file.firstletter in ["{", "["]:
    replace(file)

(fwiw this is pseudocode. My question is language agnostic)

This will work in most cases, since most files that aren't JSON tend to start without brackets.

Another example I can think of is whether the file ends with a bracket.

What are other ways to efficiently estimate whether a file contains JSON?

You could look at the file extension (eg *.json ).

Other than that checking the starting/ending characters ( { , [ ) is a good heuristic to quickly rule out non-JSON files. But it can lead to false positives, so as the second step you should still try parsing the file as JSON to really know if it's JSON.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM