简体   繁体   English

用于搜索 Pandas 数据框的 Python 函数

[英]Python function for searching Pandas dataframe

I have a simple method to search a pandas dataframe column for a list of keywords;我有一个简单的方法可以在pandas数据框列中搜索关键字列表; however, I'd like to create a function to pass a word (or words) through so I don't need to continuously update my search list.但是,我想创建一个函数来传递一个(或多个)单词,这样我就不需要不断更新我的搜索列表。

My current method:我目前的方法:

keywords = ['keyword1', 'keyword2', 'keyword3', 'keyword4']
searched_keywords = '|'.join(keywords)
df= df[df['text'].str.contains(searched_keywords, na=False)]
print(df)

What I'd like to accomplish:我想完成的事情:

def search(keyword):
    search = '|'.join(keyword)
    searched = df[df['text'].str.contains(search, na=False)]
    return searched

I would then call search(keyword) and update the dataframe with the columns containing the search terms.然后我会调用search(keyword)并使用包含搜索词的列更新数据框。 I'm running into an issue though where the dataframe is being returned without the keywords.尽管在没有关键字的情况下返回数据框,但我遇到了一个问题。 Where am I going wrong?我哪里错了?

Data (example search term 'pokemon'):数据(示例搜索词“pokemon”):

index text
1,Pokemon crashed in me 😤
2,Who knew that that baggage claim would be more hypnotic than Pokemon Go.  Nadi /MSOmSnHPNs
3,Get a SecretDoubleDown with every Pokemonster found today.
4,Anyone out there with a Fitbit add me and let's get competitive. This Pokemon Go stuff is good… /iw194ni6kH
5,What happens when the PokemonGo craze is over. Will they all just be left to roam the streets like the homeless?
6,Gotta Catch Em All! pokemongo pokemon ratata oddish pidgey eeve rhihorn doduo magmar… /6KCbkcKIBo
7,I found ピジョン in McDonald's pokemongo pokemon game play game ã¯ã¾ã£ã¦ã„ã‚‹ getã ãœ macdonalds get… /DWD4Bh3RI9
8,Had a stand off against this Koffing in town today. Don't worry I caught it 👠 PokemonGO… /IPaT7bEDeI
9,Mencari Pokemon with the genkss 🤘ðŸ»ðŸ‘»ðŸ‘½ðŸ˜… (at The Square) [pic] — /tWLtjRhIP9
10,Waikato uni pokemon go fever pokemongo waikatouniversity … /UomascadDf
11,Where pokemon go has taken me 😂  Hamilton Gardens /fHmAd8kFrQ
12,Caught myself a Pidgeot! 🥠pokemongo newzealand  Hamilton Gardens /av4LfD3eEt
13,My prized possession 😠pokemongo jigglypuff walkingisgoodforme… /XJ1KGgVglK
14,Hahaha thetruth truth pokemongo pokemon niantic smartphone android iphone game… /PjNOYdJy5L
15,On an adventure for Pokemon •  Garden Place /4m9TviEq31
16,pokemon😂hamiltonchartwellstarbuckspokemonpokemonballstrawberryvanilla goodãƒã‚±ãƒ¢ãƒ³ … /vnWbbrsBsY
17,When ur boss and team member are walking around catching Pokemon at work lol Hahahaha pokemongo… /Qr6Q4Je6Bq
18,Ran out of balls so had to use tubes but this one got away   pokemongo pokemon… /OjUGUDbZib
19,Our first Pokemon in the house! Amber was so excited she pounced on it! PokemonGo  The Dansion /w8sWppGMk6
20,Pokemon hunting solo! ( Howick Beach in Howick
21,Gorgeous day for a walk. wellingtonnz nature catchingpokemon  Tihati Bay
22,Lures are ON at The Flying Moa Pokē Stop pokemongo theflyingmoa flyingmoa pokemongoauckland… /FVWaI3b0u6
23,While waiting for a pokemon to appear we saw this real life "thing" as Chris called it.… /WPXUmxvVS8
24,Pokemon go is a danger to my health. It's real blood.this is a real injury. dontpokemonanddrive… /dFXecLSElG
25,If I was to catch how many people are playing Pokemon Go
26,is still get hair done
27,i had no class todai why did i wait 630 to start do everyth
28,passei o dia com o meu amor comemo demai <3 @guugaraujo
29,4 hari ngga ada kepsek rasanya nyaman bgt kerjaan juga lebih teratur tp skalinya doi masuk administrasi kacau balau lg yanasib
30,never a dull moment with emma <3 twitter/MLEFFin_awesome/status/431584519951749120/photo/1
31,good morn
32,that Oikos commerci with @johnstamos @bobsaget and @davecoulier is better than my whole life #takesmeback #youcankissmeanytimejohn
33,rememb when we would go to club zoo :D
34,@itscourtney_365 thei call
35,when you see your hometown in your english book twitter/norastanky/status/431584528302223360/photo/1
36,i'm at longhorn steakhouse brandon fl 4sq/1bzZsrp
37,@tonichopchop moron drive me nut
38,my god sister got drink
39,andré vc e o vitor estão de parabén pela dupla melhor do que a do Pliny_the_Elder @esp_interativo #onordestemerece #esporteinterativo
40,:yes: California_Pizza_Kitchen instagram/p/kGDyoYm7lM/
41,@jjoshjjosh @piersmorgan bewar josh you miss a comma befor the word know in your Twitter he'll have you for that #grammar
42,morn
43,thi be that tbt 8) twitter/pinoy_boiiiii/status/431584549273751553/photo/1
44,im here twitter/aaaaatkh/status/431584549290516482/photo/1
45,@_shortyyy_ hahaha i bet that great :D
46,twitter/Mahfuz_Eugene/status/431584553501589504/photo/1
47,ã¡ã‚‡ã£ã¨ã¾ã£ã¦ :no: é…刻ã‹ã‚‚ã‹ã‚‚ã‹ã‚‚笑
48,sorri yeee ga ada kta galau d kamu ku :P @rita_agustinaa emangnya kamu @arinisukawati statusnya galau :P @rita_agustinaa oiya
49,me estoi quedando fritiiita

I tried your function and it works.我试过你的功能,它有效。 The problem may be the keyword values that you pass.问题可能出在您传递的keyword值上。

I have made a small change to your function in order to make it a little more useful:我对您的函数进行了一些小改动,以使其更有用:

def search(keyword, df):
    search = '|'.join(keyword)
    searched = df[df['text'].str.contains(search, na=False)]
    return searched

Example:例子:

df2 = search(["Pokemon"], df)

df2.head()
    index   text
0   1   Pokemon crashed in me 😤
1   2   Who knew that that baggage claim would be more...
2   3   Get a SecretDoubleDown with every Pokemonster ...
3   4   Anyone out there with a Fitbit add me and let'...
4   5   What happens when the PokemonGo craze is over....

and then you could keep searching the new df2然后你可以继续搜索新的df2

df3 = search(["craze","crash"], df2)

df3.head()
    index   text
0   1   Pokemon crashed in me 😤
4   5   What happens when the PokemonGo craze is over....

Possible Problems可能的问题

If you pass a string如果你传递一个字符串

search("Pokemon", df)

you'll be searching for 'P|o|k|e|m|o|n'您将搜索'P|o|k|e|m|o|n'

The dataframe df must have a column named 'text' or you'll get an error.数据框df必须有一个名为“text”的列,否则您将收到错误消息。

If you keep doing df = search(['search text 1'], df) (or df = search(['search text 1']) with your original function) over and over with different terms you may end up with an empty dataframe.如果您继续使用不同的术语一遍又一遍地执行df = search(['search text 1'], df) (或df = search(['search text 1'])使用原始函数),您最终可能会得到一个空的数据框。 If you reassign the search result to df you will be effectively doing an and between the different keywords.如果您将搜索结果重新分配给df您将有效地在不同的关键字之间执行and

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM