使用Panda / Numpy搜索匹配的字符串

Question

I have been trying to solve this for a while now but have not yet gotten anywhere. 我已经尝试解决了一段时间了，但是还没有解决。 My goal is to search a string in a column called 'WORDS' and return the 'INDEXED_NUMBER'. 我的目标是在名为“ WORDS”的列中搜索字符串并返回“ INDEXED_NUMBER”。 For example, if I searched 'aaa', it should return me 0 as shown in the table below. 例如，如果我搜索了“ aaa”，它将返回我0，如下表所示。

I am using python panda and possibly is trying numpy as well. 我正在使用python panda，也可能正在尝试numpy。 Below is a sample of code I've tried: 以下是我尝试过的代码示例：

def WordToIndexwithjustPanda():
    referenceDF[referenceDF['WORDS'].str.contains('aaa')]
    #I was hoping that it will grab me the row with the word 'aaa' but 
    #it is not returning me anything

and 和

def WordToIndexwithNumpy():
    np.where(referenceDF["WORDS"].str.contains('aaa'))
    #I think this is wrong but I am not sure how is this wrong

I hope you guys can guide me to the right way of using this. 我希望你们能引导我正确使用此方法。 I am using anaconda prompt and jupyter notebook as an additional note. 我正在使用anaconda提示和jupyter笔记本作为补充说明。 I have imported panda and numpy. 我已经进口了熊猫和麻木。

Thanks in advance. 提前致谢。 XD XD

Answer 1

Use loc with boolean indexing and dont forget add return to fuction, also for return scalar need iat for select first value of filtered Series with if-else if filtering return no rows: 将loc与boolean indexing一起使用，不要忘记添加return to fuction，对于返回标量，也需要iat来选择带if-else的已过滤Series第一个值，如果过滤不返回任何行：

def WordToIndexwithjustPanda():
    a = referenceDF.loc[referenceDF['WORDS'].str.contains('aaa'), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]

You can use also parameter in function for check first occurence of value: 您还可以在函数中使用参数来检查值的首次出现：

referenceDF = pd.DataFrame({
    'WORDS': ['aaa','aaas','aactive','aadvantage','aaker'],
    'INDEXED_NUMBER': list(range(5))
})
print (referenceDF)
   INDEXED_NUMBER       WORDS
0               0         aaa
1               1        aaas
2               2     aactive
3               3  aadvantage
4               4       aaker

def WordToIndexwithjustPanda(val):
    a = referenceDF.loc[referenceDF['WORDS'].str.contains(val), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]
print (WordToIndexwithjustPanda('aaa'))
0
print (WordToIndexwithjustPanda('bbb'))
No match

Answer 2

This is one way to implement your algorithm using a generator: 这是使用生成器实现算法的一种方法：

def WordToIndexwithjustPanda():
    return next((i for i, j in zip(df['INDEXED_NUMBER', df['WORDS']) \
                 if 'aaa' in j), 'No match')

Strictly speaking it uses pandas only partially in that it uses the iterative functionality of pd.Series . 严格来说，它仅使用pandas的一部分，因为它使用了pd.Series的迭代功能。

使用Panda / Numpy搜索匹配的字符串

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-03-03 16:09:17

解决方案2
1 2018-03-03 16:17:06

使用Panda / Numpy搜索匹配的字符串

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-03-03 16:09:17

解决方案2 1 2018-03-03 16:17:06

解决方案1
1 已采纳 2018-03-03 16:09:17

解决方案2
1 2018-03-03 16:17:06