I'm have read a CSV file into pandas dataframe and trying to find all the sentences that contains the words I'm looking for and when ever finding any of them print it with its original index from the main CSV not a new index. this is the code I'm trying but it gives me an error for some reason
lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'
tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)
newdata=tdata[tdata['sentences'].str.isin(lookfor)]
print (newdata)
#a sample set
-----------------------------
#hi, how are; you
#im good thanks
#How ? Is live.
#good, what about ) you/
#my name is alex
#hello, alex how are you !
#im good!
#great news
#thanks!
-----------------------------
it returns this error
newdata=tdata[tdata['sentences'].str.isin(pat)]
AttributeError: 'StringMethods' object has no attribute 'isin'
input data looks like
output I'm expecting is
You probably want the 'contains' method, something like
df = tdata[tdata.sentences.str.contains(pat, regex=True, na=False)]
Full code should look something like;
lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'
tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)
tdata['row_index'] = 1
tdata['row_index'] = tdata['row_index'].cumsum()
filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
filtered.to_csv('./my_path.csv', index=False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.