简体   繁体   中英

Let Python take in sentence by sentence instead of word by word?

I have a series of strings, and I want Python to take it sentence by sentence when creating a tuple. For example:

string = [("I am a good boy"), ("I am a good girl")]
tuple = [("I am a good boy", -1), ("I am a good girl", -1)]

But apparently it's doing:

tuple = [("I", -1), ("am", -1), ("a", -1), ("good", -1), ("boy", -1).....]

What went wrong and how do I resolve it?

import re

def cleanedthings(trainset):
    cleanedtrain = []
    specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
    for line in trainset:
        for word in line.split():
            lowword = word.lower()
            for ch in specialch:
                if ch in lowword:
                    lowword = lowword.replace(ch,"")
            if len(lowword) >= 3:
                cleanedtrain.append(lowword)
    return cleanedtrain

poslinesTrain = [('I just wanted to drop you a note to let you know how happy I am with my cabinet'), ('The end result is a truly amazing transformation!'), ('Who can I thank for this?'), ('For without his artistry and craftmanship this transformation would not have been possible.')]

neglinesTrain = [('I have no family and no friends, very little food, no viable job and very poor future prospects.'), ('I have therefore decided that there is no further point in continuing my life.'), ('It is my intention to drive to a secluded area, near my home, feed the car exhaust into the car, take some sleeping pills and use the remaining gas in the car to end my life.')]

poslinesTest = [('Another excellent resource from Teacher\'s Clubhouse!'), ('This cake tastes awesome! It\'s almost like I\'m in heaven already oh God!'), ('Don\'t worry too much, I\'ll always be here for you when you need me. We will be playing games or watching movies together everytime to get your mind off things!'), ('Hey, this is just a simple note for you to tell you that you\'re such a great friend to be around. You\'re always being the listening ear to us, and giving us good advices. Thanks!')]

neglinesTest = [('Mum, I could write you for days, but I know nothing would actually make a difference to you.'), ('You are much too ignorant and self-concerned to even attempt to listen or understand. Everyone knows that.'), ('If I were, your BITCHY comments that I\'m assuming were your attempt to help, wouldn\'t have.'), ('If I have stayed another minute I would have painted the walls and stained the carpets with my blood, so you could clean it up... I wish I were never born.')]

clpostrain = cleanedthings(poslinesTrain)
clnegtrain = cleanedthings(neglinesTrain)

clpostest = cleanedthings(poslinesTest)
clnegtest = cleanedthings(neglinesTest)


trainset= [(x,1) for x in clpostrain] + [(x,-1) for x in clnegtrain]
testset= [(x,1) for x in clpostest] + [(x,-1) for x in clnegtest]

print testset

You joined the final result by words instead by sentences. Adding a variable for every sentence will fix your error

def cleanedthings(trainset):
    cleanedtrain = []
    specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
    for line in trainset:
        #will append the clean word of the current sentence in this var
        sentence = []
        for word in line.split():
            lowword = word.lower()
            for ch in specialch:
                if ch in lowword:
                    lowword = lowword.replace(ch,"")
            if len(lowword) >= 3:
                sentence.append(lowword)
        #once we check all words, recreate the sentence joining by white space 
        #and append to the list of cleaned sentences
        cleanedtrain.append(' '.join(sentence))
    return cleanedtrain

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM