简体   繁体   中英

Pandas: Filter rows by regex condition

I've read several questions and answers to this, but I must be doing something wrong. I'd appreciate if someone points at me what it might be.

In my df dataframe I have my first column that should always contain six digits, I'm loading the dataframe from Excel, and some smart user thought it would be too funny if adding a disclaimer in the first column.

So I have in the first column something like:

['123456', '456789', '147852', 'In compliance with...']

So I need to filter only the valid records I'm tryng:

pat='\d{6}'
filter = df[0].str.contains(pat, regex=True)

This thing returns 'False' for the disclaimer, but NaN for the match, so doing a df[filter] yields nothing

What am I doing wrong?

You should be able to do that with the following.

You need to select the rows based on the regex filter.

Note that the current regex that you are using will match anything above 6 digits as well. I changed this to include 6 digits exactly.

df = df[df[df.columns[0]].str.contains('^[0-9]{6}$', regex=True)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM