简体   繁体   中英

String Indexing in dataframe subset - pandas

I'm trying to create a subset of a pandas dataframe, based on values in a list. However, I need to include string indexing. I'll demonstrate with an example:

Here is my dataframe:

df = pd.DataFrame({'A' : ['1-2', '2', '3', '3-8', '4']})

Here is what it looks like:

A
0    1-2
1      2
2      3
3    3-8
4      4

I have a list of values I want to use to select rows from my dataframe.

list1 = ['2', '3']

I can use the.isin() function to select rows from my dataframe using my list items.

subset = df[df['A'].isin(list1)]
print(subset)

   A
1  2
2  3

However, I want any value that includes '2' or '3'. This is my desired output:

   A
1  1-2
2  2
3  3
4  3-8

Can I use string indexing in my.isin() function? I am struggling to come up with another workaround.

Check str.split with isin and any

Newdf=df[df.A.str.split('-',expand=True).isin(['2','3']).any(1)].copy()
Out[189]: 
     A
0  1-2
1    2
2    3
3  3-8

You can try with regular expression:

import re

pattern=re.compile(".*(("+(")|(").join(list1)+"))")

print(df.loc[df['A'].apply(lambda x: True if pattern.match(x) else False)])

Output:

A
0  1-2
1    2
2    3
3  3-8

[Program finished]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM