繁体   English   中英

为 python 中的 dataframe 中的特定单词赋值

[英]assigning a value to specific words in a dataframe in python

嗨,我有一个 dataframe,由 7989 行 × 1 列组成。 不同的行是不同海上海盗袭击的后果。

然后,我想根据特定单词是否包含在下面的不同列表之一中,为不同的行分配一个值。 然后分配的值将取决于不同的列表。

6个名单:

five =['kill','execute','dead']
four =['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']

我试过这样做:

df['five']=df.apply(lambda x: '5' if x == 'five' else '-')

df是我的 dataframe

任何人都可以帮忙吗?

您可以使用数字值为每个列表创建字典,将所有字典合并在一起,然后通过numpy.where设置新列:

df = pd.DataFrame({'outcom':[['kill','dead'],['abduct','aaaa'],['hostag']]})

#same way add another lists
five = ['kill','execute','dead']
four = ['kidnap','hostag','taken','abduct']   
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']    

#same way add another dicts
d5 = dict.fromkeys(five, '5')
d4 = dict.fromkeys(four, '4')
d3 = dict.fromkeys(three, '3')
d2 = dict.fromkeys(two, '2')
d1 = dict.fromkeys(one, '1')
d0 = dict.fromkeys(zero, '0')

d = {**d5, **d4, **d3, **d2, **d1, **d0}
print (d)

for k, v in d.items():
    df[k] = np.where(df['outcom'].apply(lambda x: k in x), v, '-')

print (df)
           outcom kill execute dead kidnap hostag taken abduct
0    [kill, dead]    5       -    5      -      -     -      -
1  [abduct, aaaa]    -       -    -      -      -     -      4
2        [hostag]    -       -    -      -      4     -      -

已编辑

您可以像这样使用loc function ( 文档):

导入 pandas 作为 pd

five = ["I", "like"]
df = pd.DataFrame(["I", "like", "bacon", "in", "the", "morning"], columns=["Words"])
     Words
0        I
1    likes
2    bacon
3       in
4      the
5  morning

df["New"] = df["Words"].copy()
df.loc[df["New"] == "I", "New"] = 5

     Words      New
0        I        5
1     like     like
2    bacon    bacon
3       in       in
4      the      the
5  morning  morning

然后,您可以使用 for 循环来帮助您

谢谢大家的帮助,我想我找到了一种使它起作用的方法:

 list_of_words = zero + one + two + three + four + five

     outcome_refined = df_Stop2['outcome'].apply(lambda x: [item for item in x if item 
      in list_of_words])



 outcome_numbered=[] #Create an empty list

def max_val(list): #Ensures that then we only get the largest possible value

maximum_value = 0

for i in list:
   
   if i > maximum_value:

        maximum_value = i

return [maximum_value]

        
 #Make sure that you loop through each of the lists
    
 for words in outcome_refined:
    tmp = [] #Create a temprorary empty list
    for word in words:
      if word in zero:
          word = 0
      elif word in one:
          word = 1
      elif word in two:
          word = 2
      elif word in three:
          word = 3
      elif word in four:
          word = 4
      elif word in five:
          word = 5   
      tmp.append(word)
    tmp = max_val(tmp)
    outcome_numbered.append(tmp)


df_Stop['outcome_numbered']=outcome_numbered.copy()   

df_Stop

终于工作了

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM