簡體   English   中英

當字數小於 N 時,刪除 pandas 數據幀中的字符串行

[英]remove String row in pandas data frame when number of words is less than N

我是 NLP 分類任務的預處理數據集,我想刪除少於 3 個單詞的句子,我嘗試刪除少於 3 個字母的單詞的代碼:

import re
text = "The quick brown fox jumps over the lazy dog."
# remove words between 1 and 3
shortword = re.compile(r'\W*\b\w{1,3}\b')
print(shortword.sub('', text))

如何在 python 中執行此操作?

使用 Pandas dataframe:

import pandas
text = {"header":["The quick fox","The quick fox brown jumps hight","The quick"]}
df = pandas.DataFrame(text)
df = df[df['header'].str.split().str.len().gt(2)]  
print(df)

上面的代碼片段過濾了“標題”列長度大於 2 個字的 dataframe。

For more on pandas dataframe, refer https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

希望這對您有所幫助。

import re
text= "Hi, Yaman Afadar. Welcome to stackoverflow website. You are pre-processing dataset for NLP classification task. you want to drop the sentences with less than 3 words. Here a sample code. Coud you try it please! The quick brown fox jumps over the lazy dog. So, Hello everyone."

sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)

print (sentences)

output='\n'.join(s for s in sentences if len(s.split())>3 )

print (output)

[輸出]:

['Hi, Yaman Afadar', 'Welcome to stackoverflow website', 'You are pre-processing dataset for NLP classification task', 'you want to drop the sentences with less than 3 words', 'Here a sample code', 'Coud you try it please', 'The quick brown fox jumps over the lazy dog', 'So, Hello everyone', '']

超過3個單詞的句子

Welcome to stackoverflow website You are pre-processing dataset for NLP classification task you want to drop the sentences with less than 3 words Here a sample code Coud you try it please The quick brown fox jumps over the lazy dog

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM