简体   繁体   English

有条件地从python列表中的单词中删除后缀

[英]Conditional Removal of suffix from words in a python list

The task that I have to perform is as follows : 我必须执行的任务如下:

Say I have a list of words (Just an example...the list can have any word): 假设我有一个单词列表(仅举一个例子...该列表可以包含任何单词):

'yappingly', 'yarding', 'yarly', 'yawnfully', 'yawnily', 'yawning','yawningly', 
'yawweed', 'yealing', 'yeanling', 'yearling', 'yearly', 'yearnfully','yearning', 
'yearnling', 'yeastily', 'yeasting', 'yed',  

I have to create a new list of words from which words having the suffix ing are added after removing the suffix (ie yeasting is added to the new list as yeast) and the remaining words are added as it is 我要创建从其中具有后缀单词的新列表ing去除后缀(即yeasting被添加到新的列表作为酵母)后,并将该剩余的词被添加,因为它是

Now as far as insertion of string ending with ing is concerned, i wrote the following code and it works fine 现在,就插入以ing结尾的字符串而言,我编写了以下代码,它可以正常工作

 Data=[w[0:-3] for w in wordlist if re.search('ing$',w)]

But how to add the remaining words to the list?? 但是如何将剩余的单词添加到列表中呢? How do I add an else clause to the above if statement? 如何在上述if语句中添加else子句? I was unable to find suitable documentation for the above. 我找不到上述合适的文档。 I did came across several questions on SO regarding the shorthand if else statement, but simply adding the else statement at the end of the above code doesn't work. 我确实在SO上遇到过几个有关速记if语句的问题,但是仅仅在上述代码的末尾添加else语句是行不通的。 How do I go about it?? 我该怎么办?

Secondly, if I have to extend the above regular expression for multiple suffixes say as follows: 其次,如果我必须将上述正则表达式扩展为多个后缀,请说如下:

re.search('(ing|ed|al)$',w)

How do I perform the "trim" operation to remove the suffix accordingly and simultaneously add the word to the new list?? 如何执行“修剪”操作以相应地删除后缀,同时将单词添加到新列表中? Please Help. 请帮忙。

Regarding your first question, you can use a ternary placed just before the for : 关于第一个问题,可以在for之前使用三元 for

Data=[w[0:-3] if re.search('ing$',w) else w for w in wordlist]

Regarding your second, well, the best answer in my opinion is to use re.sub as @abarnert demonstrated. 关于您的第二个问题,我认为最好的答案是使用@abarnert演示的re.sub However, you could also make a slight adaption to your use of re.search : 但是,您也可以对re.search的使用进行一些调整:

Data=[re.search('(.*)(?:ing|ed|al)$', w).group(1) for w in wordlist]

Finally, here is a link for more information on comprehensions . 最后,这是有关理解的更多信息的链接。

First, what makes you think you need a regexp at all? 首先,是什么让您认为根本不需要正则表达式? There are easier ways to strip suffixes. 有更简单的方法来删除后缀。

Second, if you want to use regexps, why not just re.sub instead of trying to use regexps and slicing together? 其次,如果要使用正则表达式,为什么不只使用re.sub而不是尝试使用正则表达式并切片呢? For example: 例如:

Data = [re.sub('(ing|ed|al)$', '', w) for w in wordlist]

Then you don't need to work out how much to slice off (which would require you to keep track of the result of re.search so you can get the length of the group, instead of just turning it into a bool). 然后,您无需算出要分割的部分(这将需要您跟踪re.search的结果,以便获得组的长度,而不仅仅是将其变成布尔值)。

But if you really want to do things your way, just replace your if filter with a conditional expression, as in iCodez's answer. 但是,如果你真的想要做的事情你的方式,只需更换你的if有一个条件表达式过滤器,如iCodez的答案。

Finally, if you're stuck on how to fit something into a one-liner, just take it out of the one-liner. 最后,如果您对如何将某些东西装入单衬管中感到困惑,只需将其从单衬管中取出即可。 It should be easy to write a strip_suffixes function that returns the suffix-stripped string (which is the original string if there was no suffix). 编写一个strip_suffixes函数应该很容易,该函数返回带后缀剥离的字符串(如果没有后缀,则为原始字符串)。 Then you can just write: 然后,您可以编写:

Data = [strip_suffixes(w) for w in wordlist]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM