python for loop when using file instead of dictionary

Question

I am using my own file instead of a Python dictionary, but when I am applying a for loop on that file I am receiving this error:

TypeError: string indices must be integers, not str

My code is given below where "sai.json" is the file that contains the dictionary.

import json
from naiveBayesClassifier import tokenizer
from naiveBayesClassifier.trainer import Trainer
from naiveBayesClassifier.classifier import Classifier

nTrainer = Trainer(tokenizer)

ofile = open("sai.json","r")

dataset=ofile.read()
print dataset

for n in dataset:
    nTrainer.train(n['text'], n['category'])

nClassifier = Classifier(nTrainer.data, tokenizer)

unknownInstance = "Even if I eat too much, is not it possible to lose some weight"

classification = nClassifier.classify(unknownInstance)
print classification

Answer 1

You are importing the json module, but you aren't using it!

You can use json.load to load JSON data from an open file into a Python dict , or you can read the file into a string and then use json.loads to load the data into a dict .

Eg,

ofile = open("sai.json","r")
data = json.load(ofile)
ofile.close()

Or even better

with open("sai.json", "r") as ifile:
    data = json.load(ofile)

Or, using json.loads :

with open("sai.json", "r") as ifile:
    dataset=ofile.read()
data = json.loads(dataset)

And then you can access the contents of data with data['text'] and
data['category'] , assuming the dictionary has those keys.

You're getting an error because dataset is a string so

for n in dataset:
    nTrainer.train(n['text'], n['category'])

loops over that string character by character, putting each character into a one element string. Strings can only be indexed by integers, not other strings, but there's not much point indexing into a one element string, since if s is a one element string then s[0] has the same contents as s

Here's the data that you put in the comment. I was assuming that your data was a list wrapped in a dict, but it's ok to have a plain list as a JSON object.
FWIW, I used print json.dumps(dataset, indent=4) to format it. Note that there is no comma following the last } in the file: that's ok in Python, but it's an error in JSON.

sai.json

[
    {
        "category": "NO", 
        "text": "hello everyone"
    }, 
    {
        "category": "YES", 
        "text": "dont use words like jerk"
    }, 
    {
        "category": "NO", 
        "text": "what the hell."
    }, 
    {
        "category": "yes", 
        "text": "you jerk"
    }
]

And now if we read it in with json.load your code should work correctly. But here's a simple demo that just prints the contents:

with open("sai.json", "r") as f:
    dataset = json.load(f)

for n in dataset:
    print "text: '%s', category: '%s'" % (n['text'], n['category'])

output

text: 'hello everyone', category: 'NO'
text: 'dont use words like jerk', category: 'YES'
text: 'what the hell.', category: 'NO'
text: 'you jerk', category: 'yes'

python for loop when using file instead of dictionary

Question

1 answers

solution1
1 2015-11-07 07:18:18

python for loop when using file instead of dictionary

Question

1 answers

solution1 1 2015-11-07 07:18:18

solution1
1 2015-11-07 07:18:18