简体   繁体   English

如何在Python列表中将单词反令牌化回原始形式

[英]How to detokenize words back to the original form in a list in Python

libOfSentences = ["Get help with the display",
                 "Display is not working properly", "I need some help"]
#removing stopwords

for i in libOfSentences:
     sentence = word_tokenize(j) #tokenize each individual word
     sentence = filter(lambda x: x not in string.punctuation, sentence) 
     cleaned_text = filter(lambda x: x not in stop_words, sentence) 

     removedStopwordsList = " ".join(cleaned_text) 

removedStopwordsList has now joined the sentences back together but I want to keep it in a list. removedStopwordsList现在将句子重新组合在一起,但我想将其保留在列表中。 The desired output is like this: 所需的输出是这样的:

["Get help display", "Display not working properly", "I need some help"]

I want to have removedStopwordsList still be a list I can loop through for example 我想removedStopwordsList仍然是我可以循环浏览的列表

removedStopwordsList[0] 

gives me 给我

"G D I" 

right now but I want removedStopwordsList[0] 现在,但是我想removedStopwordsList[0]

to output 输出

"Get help display"

The join function is what is stopping this from occurring right now but I can't find a better workaround. 加入功能是阻止这种情况立即发生的方法,但是我找不到更好的解决方法。

I want to have removedStopwordsList still be a list 我想删除StopwordsList仍然是列表

Then just make it a list instead of making it a string: 然后,仅使其成为列表,而不是使其成为字符串:

removedStopwordsList = list(cleaned_text)

Although you can do this even more simply by using a list comprehension instead of calling filter : 尽管您可以使用列表理解而不是调用filter来更简单地执行此操作:

removedStopwordsList = [x for x in sentence if x not in stop_words]

map and filter are great when you have a function you want to call on each element, but when you have an arbitrary expression, which you have to wrap up in lambda to turn into a function call, it's simpler and more readable to just use a list comprehension or generator expression. 当您具有要在每个元素上调用的函数时, mapfilter很棒,但是当您具有任意表达式(必须将其包装在lambda才能转换为函数调用)时,仅使用a就更简单易读列出理解或生成器表达式。

And you can similarly simplify the previous line. 您可以类似地简化上一行。 So: 所以:

for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]

If you need to have the joined-up string around as well, that's fine; 如果您还需要连接字符串,那很好。 you can have a second variable: 您可以有另一个变量:

removedStopwordsString = " ".join(removedStopwordsList)

If you really want a single object that can behave both ways, it wouldn't be hard to write such a class, but it would just be ugly. 如果您真的想要一个可以同时运行的对象,那么编写这样的类并不 ,但是这很丑陋。 And under the covers, it's just going to have a self.list_of_words and self.joined_string that it delegates to anyway. 而且在幕后,它只会拥有一个self.list_of_words和self.joined_string委托给它。 So, what would be the point? 那么,有什么意义呢?

At any rate, I doubt you need to keep the string around. 无论如何,我怀疑您是否需要保留字符串。 If you ever want to print it out, you can just join it on the fly: 如果您想打印出来,可以随时join

print(" ".join(removedStopwordsList))

… or even expand it into separate printables: …甚至将其扩展为单独的可打印内容:

print(*removeStopwordsList)

If you're trying to gather all of those lists into one big list, you have to actually write code to do that. 如果您试图将所有这些列表收集到一个大列表中,则必须实际编写代码来做到这一点。 Obviously if you do removeStopwordsList = <anything> each time through the loop, you're just replacing it each time through. 显然,如果您在循环中每次都执行removeStopwordsList = <anything> ,则每次都将其替换。 You need to append that to some bigger list if you want to keep all those lists around. 如果要保留所有这些列表,则需要将其append到更大的列表中。 For example: 例如:

listOfLists = []
for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]
    listOfLists.append(removedStopwordsList)

And now, if you print out listOfLists , it'll be a list of two lists of words; 现在,如果您打印出listOfLists ,它将是两个单词列表的列表; listOfLists[0] will be the first list; listOfLists[0]将是第一个列表; listOfLists[0][0] will b the first word of the first list; listOfLists[0][0]将成为第一个列表的第一个单词; etc. 等等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中将一个列表或列表列表进行字符串化并将其转换回原始形式的问题 - issue in stringifying a list or list of list in python and converting it back to original form 替换列表回到python中的原始列表 - replaced list back to original list in python Python:如何在句子的单词列表中找到一个字母并以原始大小写返回这些单词(大写/小写) - Python: How to find a letter in a sentence's list of words and return those words in their original case (upper/lower) 如何从 python 中的数字和单词的原始列表中创建仅包含数字和单词/短语的新列表? - How to create a new list with just numbers and words/phrases from a original list with both numbers and words in python? 如何将反向列表返回到 python 中的原始 state - How to return a reversed list back to its original state in python 如何将 Pandas 字符内爆(返回原始)为由 NaN 行分隔的单词? - How to implode(back to original) Pandas characters to words separated by NaN rows? 如何去标记蛋白质嵌入方法 - How to detokenize Protein Embedding Method 如何使用Python返回原始提示 - How to Go Back to an Original Prompt in Python 如何恢复到原始的Ubuntu Python安装? - How to revert back to original Ubuntu Python installation? 如何从“倾斜”列表更改回原始列表 - How to change from a 'tilted' list back to original
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM