如何从开头和结尾删除空格？分词器（split='[.??]'）

Question

tokenizer = Tokenizer(split='[.!?]') #create a token based on sentences
tokenizer.fit_on_texts(df['Cleaned'].values)
X_data = tokenizer.texts_to_sequences(df['Cleaned'].values)
X_sequ = pad_sequences(X_data)

I got the list of tokens whith whitespace, like blow #let see the frist 10 of our text sequences我得到了带有空格的标记列表，比如 blow #let see the first 10 of our text sequences

list(tokenizer.word_index)[:10]

Output like this below Output 像下面这样

[' rnfbdhl yis',
 ' oromoon bilisoomsiteeti jirti',
 ' namni oromummaa isaatiin mataa gadi qabtee deemu hin jiru yeroo ammaa tanatti']

how i can remove whitespace from start and end automatically?如何自动从开始和结束中删除空格？ please help?请帮忙？

Answer 1

You can try regular expression using re in python.您可以在 python 中使用 re 尝试正则表达式。 So here it goes, The caret symbol tells us the beginning of a string, the dollar is the end of string, \s+ means one or more spaces.所以在这里，插入符号告诉我们字符串的开头，美元是字符串的结尾，\s+ 表示一个或多个空格。 So the regular expression means, replace every space(one or more) at the beginning or end of a string with ''(nothing).所以正则表达式的意思是，用''（无）替换字符串开头或结尾的每个空格（一个或多个）。 Let me know if it has worked for you.让我知道它是否对您有用。

import re
[re.sub(r'^\s+|\s+$', '', item) for item in list(tokenizer.word_index)]

With given string:使用给定的字符串：

x = [' rnfbdhl yis', ' oromoon bilisoomsiteeti jirti', ' namni oromummaa isaatiin mataa gadi qabtee deemu hin jiru yeroo ammaa tanatti']

Output : Output ：

[re.sub(r'^\s+|\s+$', '', item) for item in x]
['rnfbdhl yis', 'oromoon bilisoomsiteeti jirti', 'namni oromummaa isaatiin mataa gadi qabtee deemu hin jiru yeroo ammaa tanatti']

如何从开头和结尾删除空格？分词器（split='[.??]'）

问题描述

1 个解决方案

解决方案1
0 2021-03-30 12:21:54

如何从开头和结尾删除空格？ 分词器（split='[.??]'）

问题描述

1 个解决方案

解决方案1 0 2021-03-30 12:21:54

如何从开头和结尾删除空格？分词器（split='[.??]'）

解决方案1
0 2021-03-30 12:21:54