简体   繁体   中英

how to check for specific characters inside csv file using pandas

I'm have read a CSV file into pandas dataframe and trying to find all the sentences that contains the words I'm looking for and when ever finding any of them print it with its original index from the main CSV not a new index. this is the code I'm trying but it gives me an error for some reason

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'


tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

newdata=tdata[tdata['sentences'].str.isin(lookfor)]

print (newdata)


#a sample set
-----------------------------

#hi, how are; you 
#im good thanks
#How ? Is live.
#good, what about ) you/
#my name is alex
#hello, alex how are you !
#im good!
#great news
#thanks!
-----------------------------

it returns this error


newdata=tdata[tdata['sentences'].str.isin(pat)]
AttributeError: 'StringMethods' object has no attribute 'isin'

input data looks like

在此处输入图片说明

output I'm expecting is

在此处输入图片说明

You probably want the 'contains' method, something like

df = tdata[tdata.sentences.str.contains(pat, regex=True, na=False)]

Full code should look something like;

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'

tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

tdata['row_index'] = 1
tdata['row_index'] = tdata['row_index'].cumsum()

filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
filtered.to_csv('./my_path.csv', index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM