[英]Creating new columns to assign value whether a column contains a word
我有一個包含多列的數據集。 我需要查看一列是否包含一些單詞:
數據集示例:
Text Date
Donald is a common name in the US 02/12/2020
Donald Trump is the president of the United States 05/21/2017
I have never been in the US 11/02/2016
我需要如下
Text Date Donald Donald Trump
Donald is a common name in the US 02/12/2020 1 0
Donald Trump is the president of the United States 05/21/2017 1 1
I have never been in the US 11/02/2016 0 0
我嘗試了以下方法:
df_donald=df_low[df_low['Text'].str.contains("donald")]
df_donald['Donald']=1
和
df_donald_trump=df_low[df_low['Text']str.contains(r'(?=.*donald)(?=.*trump)')]
df_donald_trump['Donald Trump']=1
然后與原始數據集連接,但我更願意在同一數據集中進行。
我怎么能做到?
添加參數case=False
和1,0
為True, False
使用Series.view
:
m1 = df_low['Text'].str.contains("donald", case=False)
m2 = df_low['Text'].str.contains(r'(?=.*donald)(?=.*trump)', case=False)
df_low['Donald'] = m1.view('i1')
df_low['Donald Trump'] = m2.view('i1')
選擇:
df_low['Donald'] = m1.astype('int')
df_low['Donald Trump'] = m2.astype('int')
print (df_low)
Text Date Donald \
0 Donald is a common name in the US 02/12/2020 1
1 Donald Trump is the president of the United St... 05/21/2017 1
2 I have never been in the US 11/02/2016 0
Donald Trump
0 0
1 1
2 0
或者:
m1 = df_low['Text'].str.contains("donald", case=False)
m3 = df_low['Text'].str.contains('trump', case=False)
df_low['Donald'] = m1.view('i1')
df_low['Donald Trump'] = (m1 & m3).view('i1')
選擇:
df_low['Donald'] = m1.astype('int')
df_low['Donald Trump'] = (m1 & m3).astype('int')
print (df_low)
Text Date Donald \
0 Donald is a common name in the US 02/12/2020 1
1 Donald Trump is the president of the United St... 05/21/2017 1
2 I have never been in the US 11/02/2016 0
Donald Trump
0 0
1 1
2 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.