![](/img/trans.png)
[英]How do I assign the output of a `str.contains` to a Pandas column?
[英]Add 2nd column when column matches in str.contains
我想通過搜索searchList
並檢查是否列text
str.contains
一個或多個各searchWord
。 如果找到匹配項,我想將數據附加到masterdf
,這很容易實現,如下所示。 但是我也想用searchWord
添加一個新列,以便知道哪些text
與什么匹配。 以下代碼將匹配的最新搜索填充到searchWord
列中。
masterdf = pd.DataFrame(columns=['doc_id','text',])
for searchWord in searchList:
search = jsons_data[jsons_data['text'].str.contains(searchWord)]
if len(search) > 0:
masterdf = masterdf.append(search)
masterdf['searchWord'] = searchWord
我想這就是你所追求的。
讓我們設置示例數據:
tt = '''I want to search through the. searchList and check if column text str.contains one or more of each searchWord. If I get a match I want to append the data to masterdf which is easily accomplished as seen below. But I also want to add a new column with searchWord so that I know which text matched with what. This code below fills the column searchWord with the. latest search that matched'''
text_col = tt.split('.')
id_col = range(len(text_col))
jsons_data = pd.DataFrame({'doc_id':id_col,'text':text_col})
searchList = ['code','fills', 'But','also','want']
示例jsons_data
是
doc_id text
0 0 I want to search through the
1 1 searchList and check if column text str
2 2 contains one or more of each searchWord
3 3 If I get a match I want to append the data to...
4 4 But I also want to add a new column with sear...
5 5 This code below fills the column searchWord w...
6 6 latest search that matched
使用search['searchWord'] = searchWord
修改代碼,我們得到:
masterdf = pd.DataFrame(columns=['doc_id','text','searchWord'])
for searchWord in searchList:
search = jsons_data[jsons_data['text'].str.contains(searchWord)]
if len(search) > 0:
search['searchWord'] = searchWord
masterdf = masterdf.append(search)
而masterdf
是
doc_id text searchWord
5 5.0 This code below fills the column searchWord w... code
5 5.0 This code below fills the column searchWord w... fills
4 4.0 But I also want to add a new column with sear... But
4 4.0 But I also want to add a new column with sear... also
0 0.0 I want to search through the want
3 3.0 If I get a match I want to append the data to... want
4 4.0 But I also want to add a new column with sear... want
我建議使用向量化(無循環)方法:
In [84]: df
Out[84]:
doc_id text
0 0 I want to search through the
1 1 searchList and check if column text str
2 2 contains one or more of each searchWord
3 3 If I get a match I want to append the data to masterdf which is easily accomplished as seen below
4 4 But I also want to add a new column with searchWord so that I know which text matched with what
5 5 This code below fills the column searchWord with the
6 6 latest search that matched
In [85]: searchList = ['code', 'fills', 'but', 'also', 'want']
In [86]: words_re = '{}'.format('|'.join(searchList).lower())
In [87]: words_re
Out[87]: 'code|fills|but|also|want'
In [88]: masterdf = df[df.text.str.contains('(?:{})'.format(words_re))].copy()
In [89]: masterdf['searchWord'] = masterdf.text.str.findall('({})'.format(words_re)).str.join('|')
In [90]: masterdf
Out[90]:
doc_id text searchWord
0 0 I want to search through the want
3 3 If I get a match I want to append the data to masterdf which is easily accomplished as seen below want
4 4 But I also want to add a new column with searchWord so that I know which text matched with what also|want
5 5 This code below fills the column searchWord with the code|fills
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.