String Indexing in dataframe subset - pandas

Question

I'm trying to create a subset of a pandas dataframe, based on values in a list. However, I need to include string indexing. I'll demonstrate with an example:

Here is my dataframe:

df = pd.DataFrame({'A' : ['1-2', '2', '3', '3-8', '4']})

Here is what it looks like:

I have a list of values I want to use to select rows from my dataframe.

list1 = ['2', '3']

I can use the.isin() function to select rows from my dataframe using my list items.

subset = df[df['A'].isin(list1)]
print(subset)

   A
1  2
2  3

However, I want any value that includes '2' or '3'. This is my desired output:

Can I use string indexing in my.isin() function? I am struggling to come up with another workaround.

Answer 1

Check str.split with isin and any

Newdf=df[df.A.str.split('-',expand=True).isin(['2','3']).any(1)].copy()
Out[189]: 
     A
0  1-2
1    2
2    3
3  3-8

Answer 2

You can try with regular expression:

import re

pattern=re.compile(".*(("+(")|(").join(list1)+"))")

print(df.loc[df['A'].apply(lambda x: True if pattern.match(x) else False)])

Output:

A
0  1-2
1    2
2    3
3  3-8

[Program finished]

String Indexing in dataframe subset - pandas

Question

2 answers

solution1
3 ACCPTED 2019-10-29 19:07:48

solution2
1 2019-10-29 19:18:56

String Indexing in dataframe subset - pandas

Question

2 answers

solution1 3 ACCPTED 2019-10-29 19:07:48

solution2 1 2019-10-29 19:18:56

solution1
3 ACCPTED 2019-10-29 19:07:48

solution2
1 2019-10-29 19:18:56