为 python 中的 dataframe 中的特定单词赋值

Question

嗨，我有一个 dataframe，由 7989 行 × 1 列组成。 不同的行是不同海上海盗袭击的后果。

然后，我想根据特定单词是否包含在下面的不同列表之一中，为不同的行分配一个值。 然后分配的值将取决于不同的列表。

6个名单：

five =['kill','execute','dead']
four =['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']

我试过这样做：

df['five']=df.apply(lambda x: '5' if x == 'five' else '-')

df是我的 dataframe

任何人都可以帮忙吗？

Answer 1

您可以使用数字值为每个列表创建字典，将所有字典合并在一起，然后通过numpy.where设置新列：

df = pd.DataFrame({'outcom':[['kill','dead'],['abduct','aaaa'],['hostag']]})

#same way add another lists
five = ['kill','execute','dead']
four = ['kidnap','hostag','taken','abduct']   
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']    

#same way add another dicts
d5 = dict.fromkeys(five, '5')
d4 = dict.fromkeys(four, '4')
d3 = dict.fromkeys(three, '3')
d2 = dict.fromkeys(two, '2')
d1 = dict.fromkeys(one, '1')
d0 = dict.fromkeys(zero, '0')

d = {**d5, **d4, **d3, **d2, **d1, **d0}
print (d)

for k, v in d.items():
    df[k] = np.where(df['outcom'].apply(lambda x: k in x), v, '-')

print (df)
           outcom kill execute dead kidnap hostag taken abduct
0    [kill, dead]    5       -    5      -      -     -      -
1  [abduct, aaaa]    -       -    -      -      -     -      4
2        [hostag]    -       -    -      -      4     -      -

Answer 2

已编辑

您可以像这样使用loc function （文档）：

导入 pandas 作为 pd

five = ["I", "like"]
df = pd.DataFrame(["I", "like", "bacon", "in", "the", "morning"], columns=["Words"])
     Words
0        I
1    likes
2    bacon
3       in
4      the
5  morning

df["New"] = df["Words"].copy()
df.loc[df["New"] == "I", "New"] = 5

     Words      New
0        I        5
1     like     like
2    bacon    bacon
3       in       in
4      the      the
5  morning  morning

然后，您可以使用 for 循环来帮助您

Answer 3

谢谢大家的帮助，我想我找到了一种使它起作用的方法：

 list_of_words = zero + one + two + three + four + five

     outcome_refined = df_Stop2['outcome'].apply(lambda x: [item for item in x if item 
      in list_of_words])



 outcome_numbered=[] #Create an empty list

def max_val(list): #Ensures that then we only get the largest possible value

maximum_value = 0

for i in list:
   
   if i > maximum_value:

        maximum_value = i

return [maximum_value]

        
 #Make sure that you loop through each of the lists
    
 for words in outcome_refined:
    tmp = [] #Create a temprorary empty list
    for word in words:
      if word in zero:
          word = 0
      elif word in one:
          word = 1
      elif word in two:
          word = 2
      elif word in three:
          word = 3
      elif word in four:
          word = 4
      elif word in five:
          word = 5   
      tmp.append(word)
    tmp = max_val(tmp)
    outcome_numbered.append(tmp)


df_Stop['outcome_numbered']=outcome_numbered.copy()   

df_Stop

终于工作了

为 python 中的 dataframe 中的特定单词赋值

问题描述

3 个解决方案

解决方案1
1 2020-08-22 15:24:04

解决方案2
0 2020-08-22 14:56:09

解决方案3
0 2020-08-22 16:27:21

为 python 中的 dataframe 中的特定单词赋值

问题描述

3 个解决方案

解决方案1 1 2020-08-22 15:24:04

解决方案2 0 2020-08-22 14:56:09

解决方案3 0 2020-08-22 16:27:21

解决方案1
1 2020-08-22 15:24:04

解决方案2
0 2020-08-22 14:56:09

解决方案3
0 2020-08-22 16:27:21