how to check for specific characters inside csv file using pandas

Question

I'm have read a CSV file into pandas dataframe and trying to find all the sentences that contains the words I'm looking for and when ever finding any of them print it with its original index from the main CSV not a new index. this is the code I'm trying but it gives me an error for some reason

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'


tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

newdata=tdata[tdata['sentences'].str.isin(lookfor)]

print (newdata)


#a sample set
-----------------------------

#hi, how are; you 
#im good thanks
#How ? Is live.
#good, what about ) you/
#my name is alex
#hello, alex how are you !
#im good!
#great news
#thanks!
-----------------------------

it returns this error


newdata=tdata[tdata['sentences'].str.isin(pat)]
AttributeError: 'StringMethods' object has no attribute 'isin'

input data looks like

output I'm expecting is

Answer 1

You probably want the 'contains' method, something like

df = tdata[tdata.sentences.str.contains(pat, regex=True, na=False)]

Full code should look something like;

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'

tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

tdata['row_index'] = 1
tdata['row_index'] = tdata['row_index'].cumsum()

filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
filtered.to_csv('./my_path.csv', index=False)

how to check for specific characters inside csv file using pandas

Question

1 answers

solution1
1 ACCPTED 2020-02-05 11:41:13

how to check for specific characters inside csv file using pandas

Question

1 answers

solution1 1 ACCPTED 2020-02-05 11:41:13

solution1
1 ACCPTED 2020-02-05 11:41:13