![](/img/trans.png)
[英]Remove rows when the occurrence of a column value in the data frame is less than a certain number using pandas/python?
[英]remove String row in pandas data frame when number of words is less than N
我是 NLP 分類任務的預處理數據集,我想刪除少於 3 個單詞的句子,我嘗試刪除少於 3 個字母的單詞的代碼:
import re
text = "The quick brown fox jumps over the lazy dog."
# remove words between 1 and 3
shortword = re.compile(r'\W*\b\w{1,3}\b')
print(shortword.sub('', text))
如何在 python 中執行此操作?
使用 Pandas dataframe:
import pandas
text = {"header":["The quick fox","The quick fox brown jumps hight","The quick"]}
df = pandas.DataFrame(text)
df = df[df['header'].str.split().str.len().gt(2)]
print(df)
上面的代碼片段過濾了“標題”列長度大於 2 個字的 dataframe。
For more on pandas dataframe, refer https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
希望這對您有所幫助。
import re
text= "Hi, Yaman Afadar. Welcome to stackoverflow website. You are pre-processing dataset for NLP classification task. you want to drop the sentences with less than 3 words. Here a sample code. Coud you try it please! The quick brown fox jumps over the lazy dog. So, Hello everyone."
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)
print (sentences)
output='\n'.join(s for s in sentences if len(s.split())>3 )
print (output)
[輸出]:
['Hi, Yaman Afadar', 'Welcome to stackoverflow website', 'You are pre-processing dataset for NLP classification task', 'you want to drop the sentences with less than 3 words', 'Here a sample code', 'Coud you try it please', 'The quick brown fox jumps over the lazy dog', 'So, Hello everyone', '']
超過3個單詞的句子
Welcome to stackoverflow website You are pre-processing dataset for NLP classification task you want to drop the sentences with less than 3 words Here a sample code Coud you try it please The quick brown fox jumps over the lazy dog
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.