Pandas: Filter rows by regex condition

Question

I've read several questions and answers to this, but I must be doing something wrong. I'd appreciate if someone points at me what it might be.

In my df dataframe I have my first column that should always contain six digits, I'm loading the dataframe from Excel, and some smart user thought it would be too funny if adding a disclaimer in the first column.

So I have in the first column something like:

['123456', '456789', '147852', 'In compliance with...']

So I need to filter only the valid records I'm tryng:

pat='\d{6}'
filter = df[0].str.contains(pat, regex=True)

This thing returns 'False' for the disclaimer, but NaN for the match, so doing a df[filter] yields nothing

What am I doing wrong?

Answer 1

You should be able to do that with the following.

You need to select the rows based on the regex filter.

Note that the current regex that you are using will match anything above 6 digits as well. I changed this to include 6 digits exactly.

df = df[df[df.columns[0]].str.contains('^[0-9]{6}$', regex=True)]

Pandas: Filter rows by regex condition

Question

1 answers

solution1
1 2022-02-17 18:38:20

Pandas: Filter rows by regex condition

Question

1 answers

solution1 1 2022-02-17 18:38:20

solution1
1 2022-02-17 18:38:20