簡體   English   中英

創建新列以指定列是否包含單詞

[英]Creating new columns to assign value whether a column contains a word

我有一個包含多列的數據集。 我需要查看一列是否包含一些單詞:

  • 如果它包含單詞“Donald”,則創建一個名為“Donald”的新列,並為包含該單詞的所有行分配 1,否則為 0;
  • 如果它同時包含單詞“Donald”和“Trump”,則創建一個名為“Donald Trump”的新列,並將包含這兩個單詞的所有行分配為 1,否則為 0。

數據集示例:

Text                                                              Date
Donald is a common name in the US                               02/12/2020
Donald Trump is the president of the United States              05/21/2017
I have never been in the US                                     11/02/2016

我需要如下

Text                                                              Date            Donald    Donald Trump
Donald is a common name in the US                               02/12/2020           1       0
Donald Trump is the president of the United States              05/21/2017           1       1
I have never been in the US                                     11/02/2016           0       0

我嘗試了以下方法:

df_donald=df_low[df_low['Text'].str.contains("donald")]
df_donald['Donald']=1

df_donald_trump=df_low[df_low['Text']str.contains(r'(?=.*donald)(?=.*trump)')]

df_donald_trump['Donald Trump']=1

然后與原始數據集連接,但我更願意在同一數據集中進行。

我怎么能做到?

添加參數case=False1,0True, False使用Series.view

m1 = df_low['Text'].str.contains("donald", case=False)
m2 = df_low['Text'].str.contains(r'(?=.*donald)(?=.*trump)', case=False)

df_low['Donald'] = m1.view('i1')
df_low['Donald Trump'] = m2.view('i1')

選擇:

df_low['Donald'] = m1.astype('int')
df_low['Donald Trump'] = m2.astype('int')
print (df_low)
                                                Text        Date  Donald  \
0                  Donald is a common name in the US  02/12/2020       1   
1  Donald Trump is the president of the United St...  05/21/2017       1   
2                        I have never been in the US  11/02/2016       0   

   Donald Trump  
0             0  
1             1  
2             0 

或者:

m1 = df_low['Text'].str.contains("donald", case=False)
m3 = df_low['Text'].str.contains('trump', case=False)

df_low['Donald'] = m1.view('i1')
df_low['Donald Trump'] = (m1 & m3).view('i1')

選擇:

df_low['Donald'] = m1.astype('int')
df_low['Donald Trump'] = (m1 & m3).astype('int')

print (df_low)
                                                Text        Date  Donald  \
0                  Donald is a common name in the US  02/12/2020       1   
1  Donald Trump is the president of the United St...  05/21/2017       1   
2                        I have never been in the US  11/02/2016       0   

   Donald Trump  
0             0  
1             1  
2             0  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM