简体   繁体   English

如何在Python中的一定数量的单词后剥离字符串

[英]How to strip a string after a certain amount of words in python

I have a paragraph "Lorem ipsum foo bar foobar stuff etc" 我有一个段落“ Lorem ipsum foo bar foobar的东西等”
In python, how might I strip this string after a certain amount of words say in this case 4? 在python中,在这种情况下,我会如何在一定数量的单词说完之后剥离此字符串4?

If you only want them separated by spaces then: 如果只希望它们之间用空格隔开,则:

>>>s = "Lorem ipsum foo bar foobar stuff etc"
>>>o = ' '.join(s.split(' ')[:4])
"Lorem ipsum foo bar"

should do the trick. 应该可以。

This is very naive, if you need something fancier then regex are the way to go. 这非常幼稚,如果您需要更高级的产品,那么regex是您的理想之选。 By something fancier I'm referring to more delimiters than spaces, grammar punctuation, etc. 我更指的是定界符,而不是空格,语法标点符号等。

For example: 例如:

>>>import re
>>>s = "Lorem ipsum foo bar foobar stuff etc"
>>>l = re.split('[\n \r \s \t]', s)
['Lorem', 'ipsum', 'foo', 'bar']
>>>str.join(' ', l)
"Lorem ipsum foo bar"

Hope this helps! 希望这可以帮助!

@PauloBlu's answer would work in most cases, except when your paragraph contains words with uneven whitespaces. @PauloBlu的答案在大多数情况下都适用,除非您的段落中包含空格不均的单词。 Regex can work wonder in such cases 正则表达式可以在这种情况下发挥作用

>>> s = "Lorem ipsum\tfoo    bar foobar stuff etc"
>>> ''.join(re.findall("^[^\s]+|\s+[^\s]+", s)[:4])
'Lorem ipsum\tfoo    bar'

whereas using str.split + str.join may not provide you the right result 而使用str.split + str.join可能无法为您提供正确的结果

>>> ' '.join(s.split(' ')[:4])
'Lorem ipsum\tfoo  '

I have two solutions. 我有两个解决方案。

The first uses more memory: 第一个使用更多的内存:

s = "Lorem ipsum foo bar foobar stuff etc"
print ' '.join(s.split(" ")[:4])

The second may be slower: 第二个可能更慢:

s = "Lorem ipsum foo bar foobar stuff etc"
start = 0
for i in range(4): # number of words
    start = s.find(" ", start+1)
print s[:start]

In addition to the other answers you could also use this form. 除了其他答案,您还可以使用此表格。 It's not so different, but it works: 没什么不同,但是可以工作:

s = "Lorem ipsum foo bar foobar stuff etc"

print ' '.join(s.split(' ', 4)[:-1]) 
# the maxsplit arg of split('4' here) could be set to any number 'n'

"Lorem ipsum foo bar"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM