[英]Using Panda/Numpy to search matching string
I have been trying to solve this for a while now but have not yet gotten anywhere. 我已经尝试解决了一段时间了,但是还没有解决。 My goal is to search a string in a column called 'WORDS' and return the 'INDEXED_NUMBER'.
我的目标是在名为“ WORDS”的列中搜索字符串并返回“ INDEXED_NUMBER”。 For example, if I searched 'aaa', it should return me 0 as shown in the table below.
例如,如果我搜索了“ aaa”,它将返回我0,如下表所示。
I am using python panda and possibly is trying numpy as well. 我正在使用python panda,也可能正在尝试numpy。 Below is a sample of code I've tried:
以下是我尝试过的代码示例:
def WordToIndexwithjustPanda():
referenceDF[referenceDF['WORDS'].str.contains('aaa')]
#I was hoping that it will grab me the row with the word 'aaa' but
#it is not returning me anything
and 和
def WordToIndexwithNumpy():
np.where(referenceDF["WORDS"].str.contains('aaa'))
#I think this is wrong but I am not sure how is this wrong
I hope you guys can guide me to the right way of using this. 我希望你们能引导我正确使用此方法。 I am using anaconda prompt and jupyter notebook as an additional note.
我正在使用anaconda提示和jupyter笔记本作为补充说明。 I have imported panda and numpy.
我已经进口了熊猫和麻木。
Thanks in advance. 提前致谢。 XD
XD
Use loc
with boolean indexing
and dont forget add return
to fuction, also for return scalar need iat
for select first value of filtered Series
with if-else
if filtering return no rows: 将
loc
与boolean indexing
一起使用,不要忘记添加return
to fuction,对于返回标量,也需要iat
来选择带if-else
的已过滤Series
第一个值,如果过滤不返回任何行:
def WordToIndexwithjustPanda():
a = referenceDF.loc[referenceDF['WORDS'].str.contains('aaa'), 'INDEXED_NUMBER']
return 'No match' if a.empty else a.iat[0]
You can use also parameter in function for check first occurence of value: 您还可以在函数中使用参数来检查值的首次出现:
referenceDF = pd.DataFrame({
'WORDS': ['aaa','aaas','aactive','aadvantage','aaker'],
'INDEXED_NUMBER': list(range(5))
})
print (referenceDF)
INDEXED_NUMBER WORDS
0 0 aaa
1 1 aaas
2 2 aactive
3 3 aadvantage
4 4 aaker
def WordToIndexwithjustPanda(val):
a = referenceDF.loc[referenceDF['WORDS'].str.contains(val), 'INDEXED_NUMBER']
return 'No match' if a.empty else a.iat[0]
print (WordToIndexwithjustPanda('aaa'))
0
print (WordToIndexwithjustPanda('bbb'))
No match
This is one way to implement your algorithm using a generator: 这是使用生成器实现算法的一种方法:
def WordToIndexwithjustPanda():
return next((i for i, j in zip(df['INDEXED_NUMBER', df['WORDS']) \
if 'aaa' in j), 'No match')
Strictly speaking it uses pandas only partially in that it uses the iterative functionality of pd.Series
. 严格来说,它仅使用pandas的一部分,因为它使用了
pd.Series
的迭代功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.