I have the following data frame:
ProbeGenes sample1 sample2 sample3
0 1431492_at Lipn 20.3 130 1
1 1448678_at Fam118a 25.3 150 2
2 1452580_a_at Mrpl21 3.1 173 12
It's created using this code:
import pandas as pd
df = pd.DataFrame({'ProbeGenes' : ['1431492_at Lipn', '1448678_at Fam118a','1452580_a_at Mrpl21'],
'sample1' : [20.3, 25.3,3.1],
'sample2' : [130, 150,173],
'sample3' : [1.0, 2.0,12.0],
})
What I want to do then, is given a list:
list_to_grep = ["Mrpl21","lipn","XXX"]
I would like to extract (grep) the df
subset where ProbeGenes
column members is contained inside list_to_grep
, yielding:
ProbeGenes sample1 sample2 sample3
1431492_at Lipn 20.3 130 1
1452580_a_at Mrpl21 3.1 173 12
Ideally the grepping is in case-insensitive mode. How can I achieve that?
Your example doesn't really need the use of regular expressions.
Define a function that returns whether a given string contains any element of the list.
list_to_grep = ['Mrpl21', 'lipn', 'XXX']
def _grep(x, list_to_grep):
"""takes a string (x) and checks whether any string from a given
list of strings (list_to_grep) exists in `x`"""
for text in list_to_grep:
if text.lower() in x.lower():
return True
return False
Create a mask:
mask = df.ProbeGenes.apply(_grep, list_to_grep=list_to_grep)
Filter the data frame using this mask:
df[mask]
This outputs:
ProbeGenes sample1 sample2 sample3
0 1431492_at Lipn 20.3 130 1
2 1452580_a_at Mrpl21 3.1 173 12
Note, this works well for small datasets, but I've experienced unreasonably long times applying functions to text columns in big data frames (~ 10 GB), where applying the function to a list take much less time and I don't know why
For reasons that are beyond me, something like this allows me to filter much faster
>>> from functools import partial
>>> mylist = df.ProbeGenes.tolist()
>>> _greppy = partial(_grep, list_to_grep=list_to_grep)
>>> mymask = list(map(_greppy, mylist))
>>> df[mymask]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.