簡體   English   中英

當列匹配str.contains時添加第二列

[英]Add 2nd column when column matches in str.contains

我想通過搜索searchList並檢查是否列text str.contains一個或多個各searchWord 如果找到匹配項,我想將數據附加到masterdf ,這很容易實現,如下所示。 但是我也想用searchWord添加一個新列,以便知道哪些text與什么匹配。 以下代碼將匹配的最新搜索填充到searchWord列中。

masterdf = pd.DataFrame(columns=['doc_id','text',])

for searchWord in searchList:
    search = jsons_data[jsons_data['text'].str.contains(searchWord)]
    if len(search) > 0:
        masterdf = masterdf.append(search)
        masterdf['searchWord'] = searchWord

我想這就是你所追求的。

讓我們設置示例數據:

tt = '''I want to search through the. searchList and check if column text str.contains one or more of each searchWord. If I get a match I want to append the data to masterdf which is easily accomplished as seen below. But I also want to add a new column with searchWord so that I know which text matched with what. This code below fills the column searchWord with the. latest search that matched'''
text_col = tt.split('.')
id_col = range(len(text_col))
jsons_data = pd.DataFrame({'doc_id':id_col,'text':text_col})

searchList = ['code','fills', 'But','also','want']

示例jsons_data

    doc_id  text
0   0       I want to search through the
1   1       searchList and check if column text str
2   2       contains one or more of each searchWord
3   3       If I get a match I want to append the data to...
4   4       But I also want to add a new column with sear...
5   5       This code below fills the column searchWord w...
6   6       latest search that matched

使用search['searchWord'] = searchWord修改代碼,我們得到:

masterdf = pd.DataFrame(columns=['doc_id','text','searchWord'])

for searchWord in searchList:
    search = jsons_data[jsons_data['text'].str.contains(searchWord)]
    if len(search) > 0:
        search['searchWord'] = searchWord
        masterdf = masterdf.append(search)

masterdf

doc_id  text                                                searchWord
5   5.0 This code below fills the column searchWord w...    code
5   5.0 This code below fills the column searchWord w...    fills
4   4.0 But I also want to add a new column with sear...    But
4   4.0 But I also want to add a new column with sear...    also
0   0.0 I want to search through the                        want
3   3.0 If I get a match I want to append the data to...    want
4   4.0 But I also want to add a new column with sear...    want

我建議使用向量化(無循環)方法:

In [84]: df
Out[84]:
   doc_id                                                                                                text
0       0                                                                        I want to search through the
1       1                                                             searchList and check if column text str
2       2                                                             contains one or more of each searchWord
3       3   If I get a match I want to append the data to masterdf which is easily accomplished as seen below
4       4     But I also want to add a new column with searchWord so that I know which text matched with what
5       5                                                This code below fills the column searchWord with the
6       6                                                                          latest search that matched

In [85]: searchList = ['code', 'fills', 'but', 'also', 'want']

In [86]: words_re = '{}'.format('|'.join(searchList).lower())

In [87]: words_re
Out[87]: 'code|fills|but|also|want'

In [88]: masterdf = df[df.text.str.contains('(?:{})'.format(words_re))].copy()

In [89]: masterdf['searchWord'] = masterdf.text.str.findall('({})'.format(words_re)).str.join('|')

In [90]: masterdf
Out[90]:
   doc_id                                                                                                text  searchWord
0       0                                                                        I want to search through the        want
3       3   If I get a match I want to append the data to masterdf which is easily accomplished as seen below        want
4       4     But I also want to add a new column with searchWord so that I know which text matched with what   also|want
5       5                                                This code below fills the column searchWord with the  code|fills

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM