簡體   English   中英

在pandas數據框中搜索文本列而不進行循環

[英]Search over text column in pandas data frame without looping

我有一個pandas數據框,其中一列是文本描述字符串。 我需要創建一個新列,以確定列表中的一個字符串是否在文本描述中。

df = pd.DataFrame({'Description': ['2 Bedroom/1.5 Bathroom end unit Townhouse.  
Available now!', 'Very spacious studio apartment available', ' Two bedroom, 1 
bathroom condominium, superbly located in downtown']})

list_ = ['unit', 'apartment']

然后結果應該是

                                        Description    in list
0  2 Bedroom/1.5 Bathroom end unit Townhouse.  Av...    True
1           Very spacious studio apartment available    True
2   Two bedroom, 1 bathroom condominium, superbly...   False

我可以這樣做

for i in df.index.values:
    df.loc[i,'in list'] = any(w in df.loc[i,'Description'] for w in list_)

但是對於大型數據集,它需要的時間比我想要的要長。

通過使用str.contains

list_ = ['unit', 'apartment']
df.Description.str.contains('|'.join(list_))
Out[724]: 
0     True
1     True
2    False
Name: Description, dtype: bool

使用np.char.find -

v = df.Description.values.astype('U')[:, None]
df['in list'] = (np.char.find(v, list_) > 0).any(1)

df

                                         Description  in list
0  2 Bedroom/1.5 Bathroom end unit Townhouse.  Av...     True
1           Very spacious studio apartment available     True
2   Two bedroom, 1 bathroom condominium, superbly...    False

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM