簡體   English   中英

有沒有辦法獲取列表包含的與 Pandas Dataframe 中的值匹配的值?

[英]Is there a way to get the value that the list contains which matched the values in Pandas Dataframe?

我有一個像這樣的單詞列表:

words1 = ['hi','my']
words2 = ['name','is']

我有這樣的 Dataframe df

id Sentence
0  'my name was'
1  'hi i am'
2  'my phone is'
3  'what is this'
4  'her name was'

我正在運行以下代碼來獲取值匹配的 Dataframe 的索引。

matched_idx1 = df.loc[df.Sentence.str.contains('|'.join(words1)),:].index.array
matched_idx2 = df.loc[df.Sentence.str.contains('|'.join(words2)),:].index.array

因此, matched_idx1給出了數組:

[0,1,2]

matched_idx2給出了數組:

[0,2,3,4]

現在我想獲取在 contains 函數中匹配的值的列表或數組。

所以說一個新變量matched_idx1_values輸出應該是:

['my','hi','my']

對於matched_idx2_values ,輸出應該是:

['name','is','is','name']

請讓我知道如何獲取這些索引以及它們匹配的值。 這個例子很瑣碎,我的列表有更多的單詞。

謝謝!

這是使用 spaCy 的完整示例:

# Sample data
import pandas as pd
df = pd.DataFrame({'id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}, 'Sentence': {0: 'my name was', 1: 'hi i am', 2: 'my phone is', 3: 'what is this', 4: 'her name was'}})


# Load spacy
import spacy
nlp = spacy.blank("en")
ruler = nlp.add_pipe('entity_ruler', config={"overwrite_ents": True}, last=True)


# add word patterns
lst_all_patterns = list()

for wrd in words1:
    lst_all_patterns += [{"label": "words1", "pattern": [{"lower": wrd}]}]

for wrd in words2:
    lst_all_patterns += [{"label": "words1", "pattern": [{"lower": wrd}]}]

ruler.add_patterns(lst_all_patterns)


# EXAMPLE:
doc_string = nlp('my name was')
for e in doc_string.ents:
    print(e.label_, e, e.start, e.end)

# words1 my 0 1
# words1 name 1 2


# EXAMPLE dataframe
df['docs'] = df['Sentence'].map(nlp)
df['docs'].map(lambda x: [e.start for e in x.ents])

# 0    [0, 1]
# 1       [0]
# 2    [0, 2]
# 3       [1]
# 4       [1]
# Name: docs, dtype: object

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM