I want to filter a dataframe by only keeping the rows that conform with a regex pattern in a given column. The example in the documentation only filters by looking for that regex in every column in the dataframe ( documentation to filter )
So how can i change the following example
df.filter(regex='^[\d]*', axis=0)
to something like this: (Which only looks for the regex in the specified column)
df.filter(column='column_name', regex='^[\d]*', axis=0)
使用从给定列和正则表达式模式制作的布尔掩码过滤DataFrame,如下所示: df[df.column_name.str.contains('^[\\d]*', regex=True)]
Use the vectorized string method contains()
or match()
- see Testing for Strings that Match or Contain a Pattern :
df[df.column_name.str.contains('^\d+')]
or
df[df.column_name.str.match('\d+')] # Matches only start of the string
Note that I removed superfluous brackets ( []
), and replaced *
with +
, because the \\d*
will always match as it matches a zero occurrences, too (so called a zero-length match .)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.