简体   繁体   中英

how to add weights as a vector from lexicon to the tweets?

I have a dataset (Tweets), also I have a lexicon that contains words, each word has 100 columns as a weight.

I want to check if one word in tweets appears in the lexicon, I want to take the weight (100 columns) for this word, and add it to the dataset (tweet) as 100 columns,

note: if they find other words in the tweets that appear in the lexicon, make a summation for all weight.

first, I initialize 100 columns and add them to the dataset beside tweets:

train = pd.read_csv(r"Dataset.csv")
train.sahpe
#(5000,1)
train.head(3)
# Tweet
# joy, fear
# anger, joy
# sadness  

lexicon = pd.read_csv(r"lexicon with PFA.csv")
lexicon.shape
#(10000,101)
lexicon.head(2)
#word  w1  w2  w3 .... w100
#joy   0.5 0.1 0  .... 0.2
#fear  0.2 0   0.3 ... 0.1

# Assign Column - All values initailly 0 # how we can initialized all of them automatically 
train["W1"] = 0
train["W2"] = 0
train["W3"] = 0
train["w4"] = 0
.
.
.
train["w100"] = 0

train.shape
#(5000,101)

def calcExtraFeatureW1(query):
    lexicon_score_W1 = 0
    
    # For each word in Tweet
    for i in query.split(" "):
        try:
            # Search for the weights(W1_W100) values - - If available get its wights values and added to score
            sc1 = lexicon[lexicon["word"] == i]["w1"].values[0] # here, it is work for one column, i want for all 
            lexicon_score_w1 += sc1
        except:
            # May be lexicon not available, just skip
            pass
        
    return lexicon_score_w1



desired output

#Tweet      w1    w2    w3   ... w100
#joy,fear  0.7   0.1    0.3  ..  0.3

#note: in this case, the result of joy and fear calculated

In this case, it takes just the value for one column and adds it to the dataset, but I want the same progress for all columns together.

I want to check if one word in tweets appears in the dictionar

you can check if an item is in a dictionary using the in keyword,

lexicon[“word”] in train.keys()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM