Using Panda/Numpy to search matching string

Question

I have been trying to solve this for a while now but have not yet gotten anywhere. My goal is to search a string in a column called 'WORDS' and return the 'INDEXED_NUMBER'. For example, if I searched 'aaa', it should return me 0 as shown in the table below.

I am using python panda and possibly is trying numpy as well. Below is a sample of code I've tried:

def WordToIndexwithjustPanda():
    referenceDF[referenceDF['WORDS'].str.contains('aaa')]
    #I was hoping that it will grab me the row with the word 'aaa' but 
    #it is not returning me anything

and

def WordToIndexwithNumpy():
    np.where(referenceDF["WORDS"].str.contains('aaa'))
    #I think this is wrong but I am not sure how is this wrong

I hope you guys can guide me to the right way of using this. I am using anaconda prompt and jupyter notebook as an additional note. I have imported panda and numpy.

Thanks in advance. XD

Answer 1

Use loc with boolean indexing and dont forget add return to fuction, also for return scalar need iat for select first value of filtered Series with if-else if filtering return no rows:

def WordToIndexwithjustPanda():
    a = referenceDF.loc[referenceDF['WORDS'].str.contains('aaa'), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]

You can use also parameter in function for check first occurence of value:

referenceDF = pd.DataFrame({
    'WORDS': ['aaa','aaas','aactive','aadvantage','aaker'],
    'INDEXED_NUMBER': list(range(5))
})
print (referenceDF)
   INDEXED_NUMBER       WORDS
0               0         aaa
1               1        aaas
2               2     aactive
3               3  aadvantage
4               4       aaker

def WordToIndexwithjustPanda(val):
    a = referenceDF.loc[referenceDF['WORDS'].str.contains(val), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]
print (WordToIndexwithjustPanda('aaa'))
0
print (WordToIndexwithjustPanda('bbb'))
No match

Answer 2

This is one way to implement your algorithm using a generator:

def WordToIndexwithjustPanda():
    return next((i for i, j in zip(df['INDEXED_NUMBER', df['WORDS']) \
                 if 'aaa' in j), 'No match')

Strictly speaking it uses pandas only partially in that it uses the iterative functionality of pd.Series .

Using Panda/Numpy to search matching string

Question

2 answers

solution1
1 ACCPTED 2018-03-03 16:09:17

solution2
1 2018-03-03 16:17:06

Using Panda/Numpy to search matching string

Question

2 answers

solution1 1 ACCPTED 2018-03-03 16:09:17

solution2 1 2018-03-03 16:17:06

solution1
1 ACCPTED 2018-03-03 16:09:17

solution2
1 2018-03-03 16:17:06