简体   繁体   中英

Using Panda/Numpy to search matching string

I have been trying to solve this for a while now but have not yet gotten anywhere. My goal is to search a string in a column called 'WORDS' and return the 'INDEXED_NUMBER'. For example, if I searched 'aaa', it should return me 0 as shown in the table below.

样品表

I am using python panda and possibly is trying numpy as well. Below is a sample of code I've tried:

def WordToIndexwithjustPanda():
    referenceDF[referenceDF['WORDS'].str.contains('aaa')]
    #I was hoping that it will grab me the row with the word 'aaa' but 
    #it is not returning me anything

and

def WordToIndexwithNumpy():
    np.where(referenceDF["WORDS"].str.contains('aaa'))
    #I think this is wrong but I am not sure how is this wrong

I hope you guys can guide me to the right way of using this. I am using anaconda prompt and jupyter notebook as an additional note. I have imported panda and numpy.

Thanks in advance. XD

Use loc with boolean indexing and dont forget add return to fuction, also for return scalar need iat for select first value of filtered Series with if-else if filtering return no rows:

def WordToIndexwithjustPanda():
    a = referenceDF.loc[referenceDF['WORDS'].str.contains('aaa'), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]

You can use also parameter in function for check first occurence of value:

referenceDF = pd.DataFrame({
    'WORDS': ['aaa','aaas','aactive','aadvantage','aaker'],
    'INDEXED_NUMBER': list(range(5))
})
print (referenceDF)
   INDEXED_NUMBER       WORDS
0               0         aaa
1               1        aaas
2               2     aactive
3               3  aadvantage
4               4       aaker

def WordToIndexwithjustPanda(val):
    a = referenceDF.loc[referenceDF['WORDS'].str.contains(val), 'INDEXED_NUMBER']
    return 'No match' if a.empty else a.iat[0]
print (WordToIndexwithjustPanda('aaa'))
0
print (WordToIndexwithjustPanda('bbb'))
No match

This is one way to implement your algorithm using a generator:

def WordToIndexwithjustPanda():
    return next((i for i, j in zip(df['INDEXED_NUMBER', df['WORDS']) \
                 if 'aaa' in j), 'No match')

Strictly speaking it uses pandas only partially in that it uses the iterative functionality of pd.Series .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM