简体   繁体   中英

searching a string pattern from a Data-frame column in pandas

Continuing my last question in stack searching matching string pattern from dataframe column in python pandas

Suppose i have a dataframe

 name         genre
 satya      |ACTION|DRAMA|IC|
 satya      |COMEDY|DRAMA|SOCIAL|MUSIC|
 abc        |DRAMA|ACTION|BIOPIC|
 xyz        |ACTION||ROMANCE|DARMA|
 def        |ACTION|SPORT|COMEDY|IC|
 ghj        |IC|ACTIONDRAMA|NOACTION|

From the answer of my last question , i am able to search any one genre (ex IC) if independently exist in genre column and not as a part of any other genre string value (MUSIC or BIOPIC).

Now i want to find if ACTION And DRAMA both present in a genre column but not necessarily in particular order and as not part of string but individually.

So i need rows in output row[1,3,4]

 name         genre
 satya      |ACTION|DRAMA|IC|   # both adjacently present
 #row 2 will not come           # as only DRAMA present not ACTION
 abc        |DRAMA|ACTION|BIOPIC|   ### both adjacently present in diff. order
 xyz        |ACTION||ROMANCE|DARMA|   ### both present not adjacent
 ##row  5 should not present as DRAMA is not here
 ## row 6 should not come as both are not present individually(but present as one string part)

I tried something like

 x = df[df['gen'].str.contains('\|ACTION\|DRAMA\|')]
 ### got only Row  1 (ACTION and DRAMA in adjacent and in order ACTION->DRAMA)

Please somebody suggest what can be followed/added here so that i can get what i need here.

I think you can use str.contains with two conditions with AND - & :

print df
    name                        genre
0  satya            |ACTION|DRAMA|IC|
1  satya  |COMEDY|DRAMA|SOCIAL|MUSIC|
2    abc        |DRAMA|ACTION|BIOPIC|
3    xyz      |ACTION||ROMANCE|DRAMA|
4    def     |ACTION|SPORT|COMEDY|IC|
5    ghj    |IC|ACTIONDRAMA|NOACTION|

print df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') 
0     True
1    False
2     True
3     True
4    False
5    False
Name: genre, dtype: bool

print df[ df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') ]
    name                    genre
0  satya        |ACTION|DRAMA|IC|
2    abc    |DRAMA|ACTION|BIOPIC|
3    xyz  |ACTION||ROMANCE|DRAMA|

I'm not really sure about this answer because I don't have a compiler here but try using this one.

(\\|ACTION|\\|DRAMA).*?(\\|ACTION|\\|DRAMA)

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM