[英]How to get most common words with a specific value in a dataframe Python
[英]assigning a value to specific words in a dataframe in python
嗨,我有一个 dataframe,由 7989 行 × 1 列组成。 不同的行是不同海上海盗袭击的后果。
然后,我想根据特定单词是否包含在下面的不同列表之一中,为不同的行分配一个值。 然后分配的值将取决于不同的列表。
6个名单:
five =['kill','execute','dead']
four =['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']
我试过这样做:
df['five']=df.apply(lambda x: '5' if x == 'five' else '-')
df
是我的 dataframe
任何人都可以帮忙吗?
您可以使用数字值为每个列表创建字典,将所有字典合并在一起,然后通过numpy.where
设置新列:
df = pd.DataFrame({'outcom':[['kill','dead'],['abduct','aaaa'],['hostag']]})
#same way add another lists
five = ['kill','execute','dead']
four = ['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']
#same way add another dicts
d5 = dict.fromkeys(five, '5')
d4 = dict.fromkeys(four, '4')
d3 = dict.fromkeys(three, '3')
d2 = dict.fromkeys(two, '2')
d1 = dict.fromkeys(one, '1')
d0 = dict.fromkeys(zero, '0')
d = {**d5, **d4, **d3, **d2, **d1, **d0}
print (d)
for k, v in d.items():
df[k] = np.where(df['outcom'].apply(lambda x: k in x), v, '-')
print (df)
outcom kill execute dead kidnap hostag taken abduct
0 [kill, dead] 5 - 5 - - - -
1 [abduct, aaaa] - - - - - - 4
2 [hostag] - - - - 4 - -
已编辑
您可以像这样使用loc
function ( 文档):
导入 pandas 作为 pd
five = ["I", "like"]
df = pd.DataFrame(["I", "like", "bacon", "in", "the", "morning"], columns=["Words"])
Words
0 I
1 likes
2 bacon
3 in
4 the
5 morning
df["New"] = df["Words"].copy()
df.loc[df["New"] == "I", "New"] = 5
Words New
0 I 5
1 like like
2 bacon bacon
3 in in
4 the the
5 morning morning
然后,您可以使用 for 循环来帮助您
谢谢大家的帮助,我想我找到了一种使它起作用的方法:
list_of_words = zero + one + two + three + four + five
outcome_refined = df_Stop2['outcome'].apply(lambda x: [item for item in x if item
in list_of_words])
outcome_numbered=[] #Create an empty list
def max_val(list): #Ensures that then we only get the largest possible value
maximum_value = 0
for i in list:
if i > maximum_value:
maximum_value = i
return [maximum_value]
#Make sure that you loop through each of the lists
for words in outcome_refined:
tmp = [] #Create a temprorary empty list
for word in words:
if word in zero:
word = 0
elif word in one:
word = 1
elif word in two:
word = 2
elif word in three:
word = 3
elif word in four:
word = 4
elif word in five:
word = 5
tmp.append(word)
tmp = max_val(tmp)
outcome_numbered.append(tmp)
df_Stop['outcome_numbered']=outcome_numbered.copy()
df_Stop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.