简体   繁体   中英

Replacing duplicated words in python 3

I want to take a piece of text which looks like this:

Engineering will save the world from inefficiency. Inefficiency is a blight on the world and its humanity.

and return:

Engineering will save the world from inefficiency..is a blight on the . and its humanity.

That is, I want to remove duplicated words and replace them with "." This is how I started my code:

lines= ["Engineering will save the world from inefficiency.",
        "Inefficiency is a blight on the world and its humanity."]

def solve(lines):    
    clean_paragraph = []    
    for line in lines:    
        if line not in str(lines):
            clean_paragraph.append(line)
        print (clean_paragraph)    
        if word == word in line in clean_paragraph:
            word = "."              
     return clean_paragraph

My logic was to create a list with all of the worst in the strings and add each one to a new list, and then, if the word was already in the list, to replace it with ".". My code returns []. Any suggestions would be greatly appreciated!

PROBLEM:

if word == word in line in clean_paragraph:

I'm not sure what you expect of this, but it will always be False . Here it is gain with some clarifying parentheses:

if word == ((word in line) in clean_paragraph):

This evaluates word in line , which may be either Boolean value. However, that value will not appear in the text of clean_paragraph , so the resulting expression is False .

REPAIR

Write the loops you're trying to encode:

for clean_line in clean_paragraph:
    for word in clean_line:

At this point, you'll have to figure out what you want for variable names. You've tried to make a couple of variables stand for two different things at once ( line and word ; I fixed the first one).

You'll also have to learn to properly manipulate loops and their indices; part of the problem is that you've written more code at once than you can handle -- yet. Back up, write one loop at a time, and print the results, so you know what you're getting into. For instance, start with

for line in lines:

    if line not in str(lines):
        print("line", line, "is new: append")
        clean_paragraph.append(line)
    else:
        print("line", line, "is already in *lines*")

I think you'll spot another problem here -- one even earlier than the one I found. Fix this, then add only one or two lines at a time, building up your program (and programming knowledge) gradually. When something doesn't work, you know it's almost certainly the new lines.

Here is one way to do this. It replaces all duplicate words with a dot.

lines_test = (["Engineering will save the world from inefficiency.",
               "Inefficiency is a blight on the world and its humanity."])


def solve(lines):
    clean_paragraph = ""
    str_lines = " ".join(lines)
    words_lines = str_lines.replace('.', ' .').split()
    for word in words_lines:
        if word != "." and word.lower() in clean_paragraph.lower():
            word = " ."
        elif word != ".":
            word = " " + word
        clean_paragraph += word
    return clean_paragraph


print(solve(lines_test))

Output:

Engineering will save the world from inefficiency. . is . blight on . . and its humanity.

It is important to convert words or strings into the lower case or upper case (consistent form) before you make comparisons.

Another way of doing this can be :

lines_test = 'Engineering will save the world from inefficiency. Inefficiency is a blight on the world and its humanity.'

text_array = lines_test.split(" ")
formatted_text = ''
for word in text_array:
    if word.lower() not in formatted_text:   
        formatted_text = formatted_text +' '+word
    else:
        formatted_text = formatted_text +' '+'.'

print(formatted_text)  

Output

Engineering will save the world from inefficiency. . is . blight on . . and its humanity.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM