Drop rows in pandas if they contains “???”

Question

Im trying to drop rows in pandas that contains "???", it works for every other value except for "???", I do not know whats the problem.

This is my code (I have tried both types):

df = df[~df["text"].str.contains("?????", na=False)]
df = df[~df["text"].str.contains("?????")]

error that I'm getting:

re.error: nothing to repeat at position 0

It works for every other value except for "????". I have googled it, and looked all over this website but I couldnt find any solutions.

Answer 1

The parameter expects a regular expression, hence the error re.error . You can either escape the? inside the expression like this:

df = df[~df["text"].str.contains("\?\?\?\?\?")]

Or set regex=False as Vorsprung sugested:

df = df[~df["text"].str.contains("?????",regex=False)]

Answer 2

let's convert this into running code:

import numpy as np
import pandas as pd

data = {'A': ['abc', 'cxx???xx', '???',], 'B': ['add', 'ddb', 'c', ]}
df = pd.DataFrame.from_dict(data)
df

output:

    A   B
0   abc add
1   cxx???xx    ddb
2   ??? c

with this:

df[df['A'].str.contains('???',regex=False)]

output:

    A   B
1   cxx???xx    ddb
2   ??? c

you need to tell contains() , that your search string is not a regex.