简体   繁体   中英

Can't figure out how to run this Python script

Can someone explain to me (i'm not competent in programming) how to use correctly this script (link: https://github.com/dumbmatter/find-repeated-words )?. Basically it should work by taking a text file as input and outputing an HTML file with words that are repeatedly used close together highlighted, But when I run it (I installed Pyzo) I got the message: "SyntaxError: invalid syntax". I have no idea what is Python talking about, i can just assume the problem concerns the input file.

CODE:

#!/usr/bin/env python

import sys from string import punctuation from operator import itemgetter

# Check command line inputs if len(sys.argv) == 1:
    print 'Pass the input text file as the first argument.'
    sys.exit() elif len(sys.argv) == 2:
    infile = sys.argv[1]
    outfile = '%s.html' % (infile.split('.')[0],) else:
    infile = sys.argv[1]
    outfile = sys.argv[2]

print infile, outfile

N = 10 words = {} # Dict of word frequencies pos = {} # Dict of word positions scores = [] # List of word repeatedness scores articles = ['the', 'a', 'of', 'and', 'in', 'et', 'al'] # Common articles to ignore

# Build lists

words_gen = (word.strip(punctuation).lower() for line in open(infile)
                                             for word in line.split())

i = 0 for word in words_gen:
    words[word] = words.get(word, 0) + 1

    # Build a list of word positions
    if words[word] == 1:
        pos[word] = [i]
    else:
        pos[word].append(i)

    i += 1

# Calculate scores

words_gen = (word.strip(punctuation).lower() for line in open(infile)
                                             for word in line.split())

i = 0 for word in words_gen:
    scores.append(0)
#    scores[i] = -1 + sum([pow(2, -abs(d-i)) for d in pos[word]]) # The -1 accounts for the 2^0 for self words
    if word not in articles and len(word) > 2:
        for d in pos[word]:
            if d != i and abs(d-i) < 50:
                scores[i] += 1.0/abs(d-i)
    i += 1

scores = [score*1.0/max(scores) for score in scores] # Scale from 0 to 1

# Write colored output

f = open(outfile, 'w'); i = 0 for line in open(infile):
    for word in line.split():
        f.write('<span style="background: rgb(%i, 255, 255)">%s</span> ' % ((1-scores[i])*255, word))
        i += 1
    f.write('<br /><br />') f.close()

print 'Output saved to %s' % (outfile,)

Python is very sensitive to the formatting of the code, you cannot break or indent lines at the places python does not expect it. Just looking at the first lines:

import sys from string import punctuation from operator import itemgetter

should be split into 3 lines:

import sys 
from string import punctuation 
from operator import itemgetter

There are more errors like this in the code you pasted. I have downloaded the original code from the link, and it works fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM