简体   繁体   English

使用Panda / Numpy搜索匹配的字符串

[英]Using Panda/Numpy to search matching string

I have been trying to solve this for a while now but have not yet gotten anywhere. 我已经尝试解决了一段时间了,但是还没有解决。 My goal is to search a string in a column called 'WORDS' and return the 'INDEXED_NUMBER'. 我的目标是在名为“ WORDS”的列中搜索字符串并返回“ INDEXED_NUMBER”。 For example, if I searched 'aaa', it should return me 0 as shown in the table below. 例如,如果我搜索了“ aaa”,它将返回我0,如下表所示。

样品表

I am using python panda and possibly is trying numpy as well. 我正在使用python panda,也可能正在尝试numpy。 Below is a sample of code I've tried: 以下是我尝试过的代码示例:

def WordToIndexwithjustPanda():
    referenceDF[referenceDF['WORDS'].str.contains('aaa')]
    #I was hoping that it will grab me the row with the word 'aaa' but 
    #it is not returning me anything

and

def WordToIndexwithNumpy():
    np.where(referenceDF["WORDS"].str.contains('aaa'))
    #I think this is wrong but I am not sure how is this wrong

I hope you guys can guide me to the right way of using this. 我希望你们能引导我正确使用此方法。 I am using anaconda prompt and jupyter notebook as an additional note. 我正在使用anaconda提示和jupyter笔记本作为补充说明。 I have imported panda and numpy. 我已经进口了熊猫和麻木。

Thanks in advance. 提前致谢。 XD XD

Use loc with boolean indexing and dont forget add return to fuction, also for return scalar need iat for select first value of filtered Series with if-else if filtering return no rows: locboolean indexing一起使用,不要忘记添加return to fuction,对于返回标量,也需要iat来选择带if-else的已过滤Series第一个值,如果过滤不返回任何行:

def WordToIndexwithjustPanda():
    a = referenceDF.loc[referenceDF['WORDS'].str.contains('aaa'), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]

You can use also parameter in function for check first occurence of value: 您还可以在函数中使用参数来检查值的首次出现:

referenceDF = pd.DataFrame({
    'WORDS': ['aaa','aaas','aactive','aadvantage','aaker'],
    'INDEXED_NUMBER': list(range(5))
})
print (referenceDF)
   INDEXED_NUMBER       WORDS
0               0         aaa
1               1        aaas
2               2     aactive
3               3  aadvantage
4               4       aaker

def WordToIndexwithjustPanda(val):
    a = referenceDF.loc[referenceDF['WORDS'].str.contains(val), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]
print (WordToIndexwithjustPanda('aaa'))
0
print (WordToIndexwithjustPanda('bbb'))
No match

This is one way to implement your algorithm using a generator: 这是使用生成器实现算法的一种方法:

def WordToIndexwithjustPanda():
    return next((i for i, j in zip(df['INDEXED_NUMBER', df['WORDS']) \
                 if 'aaa' in j), 'No match')

Strictly speaking it uses pandas only partially in that it uses the iterative functionality of pd.Series . 严格来说,它仅使用pandas的一部分,因为它使用了pd.Series的迭代功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM