简体   繁体   English

为 python 中的 dataframe 中的特定单词赋值

[英]assigning a value to specific words in a dataframe in python

Hi I have a dataframe consisting of 7989 rows × 1 columns.嗨,我有一个 dataframe,由 7989 行 × 1 列组成。 The different rows are consequences from different maritime piracy attack.不同的行是不同海上海盗袭击的后果。

I then want to assign a value to the different rows depending on whether or not a specific word is included in one of the different list below.然后,我想根据特定单词是否包含在下面的不同列表之一中,为不同的行分配一个值。 The value assigned will then depend on the different list.然后分配的值将取决于不同的列表。

The 6 lists: 6个名单:

five =['kill','execute','dead']
four =['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']

I Have tried to do it like this:我试过这样做:

df['five']=df.apply(lambda x: '5' if x == 'five' else '-')

and df is my dataframe df是我的 dataframe

Can anyone help?任何人都可以帮忙吗?

You can create dictionary for each list with value for number, merge all dictionaries together and then set new columns by numpy.where :您可以使用数字值为每个列表创建字典,将所有字典合并在一起,然后通过numpy.where设置新列:

df = pd.DataFrame({'outcom':[['kill','dead'],['abduct','aaaa'],['hostag']]})

#same way add another lists
five = ['kill','execute','dead']
four = ['kidnap','hostag','taken','abduct']   
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']    

#same way add another dicts
d5 = dict.fromkeys(five, '5')
d4 = dict.fromkeys(four, '4')
d3 = dict.fromkeys(three, '3')
d2 = dict.fromkeys(two, '2')
d1 = dict.fromkeys(one, '1')
d0 = dict.fromkeys(zero, '0')

d = {**d5, **d4, **d3, **d2, **d1, **d0}
print (d)

for k, v in d.items():
    df[k] = np.where(df['outcom'].apply(lambda x: k in x), v, '-')

print (df)
           outcom kill execute dead kidnap hostag taken abduct
0    [kill, dead]    5       -    5      -      -     -      -
1  [abduct, aaaa]    -       -    -      -      -     -      4
2        [hostag]    -       -    -      -      4     -      -

Edited已编辑

you can use the loc function ( documentation ) like so:您可以像这样使用loc function ( 文档):

import pandas as pd导入 pandas 作为 pd

five = ["I", "like"]
df = pd.DataFrame(["I", "like", "bacon", "in", "the", "morning"], columns=["Words"])
     Words
0        I
1    likes
2    bacon
3       in
4      the
5  morning

df["New"] = df["Words"].copy()
df.loc[df["New"] == "I", "New"] = 5

     Words      New
0        I        5
1     like     like
2    bacon    bacon
3       in       in
4      the      the
5  morning  morning

you can then use a for-loop to help you然后,您可以使用 for 循环来帮助您

Thank you all for the help I think I found a way to make it work:谢谢大家的帮助,我想我找到了一种使它起作用的方法:

 list_of_words = zero + one + two + three + four + five

     outcome_refined = df_Stop2['outcome'].apply(lambda x: [item for item in x if item 
      in list_of_words])



 outcome_numbered=[] #Create an empty list

def max_val(list): #Ensures that then we only get the largest possible value

maximum_value = 0

for i in list:
   
   if i > maximum_value:

        maximum_value = i

return [maximum_value]

        
 #Make sure that you loop through each of the lists
    
 for words in outcome_refined:
    tmp = [] #Create a temprorary empty list
    for word in words:
      if word in zero:
          word = 0
      elif word in one:
          word = 1
      elif word in two:
          word = 2
      elif word in three:
          word = 3
      elif word in four:
          word = 4
      elif word in five:
          word = 5   
      tmp.append(word)
    tmp = max_val(tmp)
    outcome_numbered.append(tmp)


df_Stop['outcome_numbered']=outcome_numbered.copy()   

df_Stop

Finally working终于工作了

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM