简体   繁体   English

如何使用pandas检查csv文件中的特定字符

[英]how to check for specific characters inside csv file using pandas

I'm have read a CSV file into pandas dataframe and trying to find all the sentences that contains the words I'm looking for and when ever finding any of them print it with its original index from the main CSV not a new index.我已将一个 CSV 文件读入 Pandas 数据框中,并尝试查找包含我正在查找的单词的所有句子,并且在找到其中任何一个时,使用来自主 CSV 的原始索引而不是新索引来打印它。 this is the code I'm trying but it gives me an error for some reason这是我正在尝试的代码,但由于某种原因它给了我一个错误

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'


tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

newdata=tdata[tdata['sentences'].str.isin(lookfor)]

print (newdata)


#a sample set
-----------------------------

#hi, how are; you 
#im good thanks
#How ? Is live.
#good, what about ) you/
#my name is alex
#hello, alex how are you !
#im good!
#great news
#thanks!
-----------------------------

it returns this error它返回这个错误


newdata=tdata[tdata['sentences'].str.isin(pat)]
AttributeError: 'StringMethods' object has no attribute 'isin'

input data looks like输入数据看起来像

在此处输入图片说明

output I'm expecting is我期待的输出是

在此处输入图片说明

You probably want the 'contains' method, something like您可能想要“包含”方法,例如

df = tdata[tdata.sentences.str.contains(pat, regex=True, na=False)]

Full code should look something like;完整的代码应该是这样的;

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'

tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

tdata['row_index'] = 1
tdata['row_index'] = tdata['row_index'].cumsum()

filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
filtered.to_csv('./my_path.csv', index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM