I don't know why the length of string is '0'

Question

following is my code. not finding any comments, I will add my codes.

filenames2 = ['BROWN1_L1.txt', 'BROWN1_M1.txt', 'BROWN1_N1.txt', 'BROWN1_P1.txt', 'BROWN1_R1.txt']
with open("C:/Python27/L1_R1_TRAINING.txt", 'w') as outfile:
    for fname in filenames2:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

b = open("C:/Python27/L1_R1_TRAINING.txt", 'rU')    

filenames3 =[]
    for path, dirs, files in os.walk("C:/Python27/Reutertest"):
        for file in files:
            file = os.path.join(path, file)
            filenames3.append(file)

    with open("C:/Python27/REUTER.txt", 'w') as outfile:
        for fname in filenames3:
            with open(fname) as infile:
                for line in infile:
                    outfile.write(line)
c = open("C:/Python27/REUTER.txt", 'rU')

def Cross_Entropy(x,y):
filecontents1 = x.read()
filecontents2 = y.read()
sentence1 = filecontents1.upper()
sentence2 = filecontents2.upper()
count_A1 = sentence1.count('A')
count_B1 = sentence1.count('B')
count_C1 = sentence1.count('C')
count_all1 = len(sentence1)
prob_A1 = count_A1 / count_all1
prob_B1 = count_B1 / count_all1
prob_C1 = count_C1 / count_all1
count_A2 = sentence2.count('A')
count_B2 = sentence2.count('B')
count_C2 = sentence2.count('C')
count_all2 = len(sentence2)
prob_A2 = count_A2 / count_all2
prob_B2 = count_B2 / count_all2
prob_C2 = count_C2 / count_all2
Cross_Entropy = -(prob_A1 * math.log(prob_A2, 2) + prob_B1 * math.log(prob_B2, 2) + prob_C1 * math.log(prob_C2, 2)

Cross_Entropy(b, c)

Yes. now. I'v got error "prob_A1 = count_A1 / count_all1 ZeroDivisionError: division by zero" . what's wrong with my code? Is my orthography is wrong?

Answer 1

I'm not quite sure what is behind your failure to read your strings from the files, but your cross-entropy can be computed much more succinctly:

def crossEntropy(s1,s2):
    s1 = s1.upper()
    s2 = s2.upper()
    probsOne = (s1.count(c)/float(len(s1)) for c in 'ABC')
    probsTwo = (s2.count(c)/float(len(s2)) for c in 'ABC')
    return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))

For example,

>>> crossEntropy('abbcabcba','abbabaccc')
1.584962500721156

If this is what you want to compute -- you can now concentrate on assembling the strings to pass to crossEntropy . I would recommend getting rid of the read-write-read logic (unless you need the two files that you are trying to create) and just directly read the files in the two directories into two arrays, joining them into two big strings which are stripped of all white space and then passed to crossEntropy

Another possible approach. If all you want are the counts of 'A', 'B', 'C' in the two directories -- just create two dictionaries, one for each directory, both keyed by 'A', 'B', and 'C', and iterate through the files in each directory, reading each file in turn, iterating through but not saving the resulting string, just getting the counts of those three characters, and creating a version of crossEntropy which is expecting two dictionaries.

Something like:

def crossEntropy(d1,d2):
    countOne = sum(d1[c] for c in 'ABC')
    countTwo = sum(d2[c] for c in 'ABC')
    probsOne = (d1[c]/float(countOne) for c in 'ABC')
    probsTwo = (d2[c]/float(countTwo) for c in 'ABC')
    return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))

For example,

>>> d1 = {'A':3,'B':5,'C':2}
>>> d2 = {'A':2,'B':5,'C':3}
>>> crossEntropy(d1,d2)
1.54397154729945

I don't know why the length of string is '0'

Question

1 answers

solution1
0 2016-05-08 16:49:11

I don't know why the length of string is '0'

Question

1 answers

solution1 0 2016-05-08 16:49:11

solution1
0 2016-05-08 16:49:11