簡體   English   中英

使用 str.contains 在 CSV 文件列中搜索單詞

[英]Searching for words in a CSV file column with str.contains

我有以下 csv 文件:

start_date,end_date,pollster,sponsor,sample_size,population,party,subject,tracking,text,approve,disapprove,url
    2020-02-02,2020-02-04,YouGov,Economist,1500,a,all,Trump,FALSE,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,42,29,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
    2020-02-02,2020-02-04,YouGov,Economist,376,a,R,Trump,FALSE,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,75,6,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
    2020-02-02,2020-02-04,YouGov,Economist,523,a,D,Trump,TRUE,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,21,51,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
    2020-02-02,2020-02-04,YouGov,Economist,599,a,I,Trump,,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,39,25,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
    2020-02-07,2020-02-09,Morning Consult,"",2200,a,all,Trump,TURE,Do you approve or disapprove of the job each of the following is doing in handling the spread of coronavirus in the United States? President Donald Trump,57,22,https://morningconsult.com/wp-content/uploads/2020/02/200214_crosstabs_CORONAVIRUS_Adults_v4_JB.pdf

我想找到“文本”列包含“特朗普”和“冠狀病毒”這兩個詞的所有行

我正在使用 str.contains()

approval_polls[approval_polls.text.str.contains("Trump", "coronavirus")]

似乎我得到了正確的 output,但我不確定 str.contains() 是否可以將兩個單詞作為參數。

任何人都可以幫助我嗎?

Output:

start_date  end_date    pollster    sponsor     sample_size     population  party   subject     tracking    text    approve     disapprove  url
0   2020-02-02  2020-02-04  YouGov  Economist   1500.0  a   all     Trump   FALSE   Do you approve or disapprove of Donald Trump’s...   42.0    29.0    https://d25d2506sfb94s.cloudfront.net/cumulus_...
1   2020-02-02  2020-02-04  YouGov  Economist   376.0   a   R   Trump   FALSE   Do you approve or disapprove of Donald Trump’s...   75.0    6.0     https://d25d2506sfb94s.cloudfront.net/cumulus_...
2   2020-02-02  2020-02-04  YouGov  Economist   523.0   a   D   Trump   FALSE   Do you approve or disapprove of Donald Trump’s...   21.0    51.0    https://d25d2506sfb94s.cloudfront.net/cumulus_...

在您的示例情況下,所有行都包含兩個關鍵字,因此您應該返回所有五行。

使用 function 調用contains('Trump', 'coronavirus')您將獲得在其文本列中包含“Trump”或“coronavirus”的所有行。 要僅獲取包含“特朗普”和“冠狀病毒”的列,您可以使用以下內容

df[df['text'].str.contains('Trump') & df['text'].str.contains('coronavirus')] 

或者您可以使用正則表達式,例如,

df[df['text'].str.contains(r'^(?=.*Trump)(?=.*coronavirus)')]

您可以使用正則表達式來執行此操作,但需要采用某種精心設計的方式,因為我們在正則表達式中沒有“AND”運算符的直接代表。

import re

approval_polls[approval_polls.text.str.contains('(?=.*trump)(?=.*coronavirus)', regex=True, flags=re.IGNORECASE)]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM