繁体   English   中英

如何在保留先前结构的同时加入列表?

[英]How to join a list while preserving previous structure?

在保留以前的结构的同时,修改后加入预分割字符串时遇到麻烦。

说我有一个像这样的字符串:

string = """

This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.

This   is    a    spaced   out   sentence

Bonjour.
"""

我必须对该字符串进行一些测试..在这些单词中找到特定的单词和字符等...,然后相应地替换它们。 所以要完成,我不得不使用

string.split()

这样做的问题是,split也摆脱了\\ n,多余的空间立即破坏了先前结构的完整性。

是否有一些其他拆分方法可以使我完成此任务,还是应该寻求替代方法?

谢谢。

split方法采用可选参数来指定定界符。 如果只想使用空格( ' ' )分隔单词,则可以将其作为参数传递:

>>> string = """
...
... This is a nice piece of string isn't it?
... I assume it is so. I have to keep typing
... to use up the space. La-di-da-di-da.
...
... Bonjour.
... """
>>>
>>> string.split()
['This', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?', 'I', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing', 'to', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.', 'Bonjour.']
>>> string.split(' ')
['\n\nThis', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?\nI', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing\nto', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.\n\nBonjour.\n']
>>>

默认情况下,split方法将根据所有空格分割字符串。 如果要分别拆分谎言,则可以先用换行符拆分字符串,然后用空格拆分行:

>>> [line.split() for line in string.strip().split('\n')]
[['This', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?'], ['I', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing'], ['to', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.'], [], ['Bonjour.']]

只需用定界符分割即可:

>>> string.split(' ')
['\n\nThis', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?\nI', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing\nto', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.\n\nThis', '', '', 'is', '', '', '', 'a', '', '', '', 'spaced', '', '', 'out', '', '', 'sentence\n\nBonjour.\n']

并找回它:

>>> ' '.join(a)
This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.

This   is    a    spaced   out   sentence

Bonjour.

只需执行string.split(' ') (注意split方法的space参数)。

这会将您宝贵的新行保留在生成结果数组的字符串中...

您可以将空格保存在另一个列表中,然后在修改单词列表后将它们连接在一起。

In [1]: from nltk.tokenize import RegexpTokenizer
In [2]: spacestokenizer = RegexpTokenizer(r'\s+', gaps=False)

In [3]: wordtokenizer = RegexpTokenizer(r'\s+', gaps=True)

In [4]: string = """
   ...: 
   ...: This is a nice piece of string isn't it?
   ...: I assume it is so. I have to keep typing
   ...: to use up the space. La-di-da-di-da.
   ...: 
   ...: This   is    a    spaced   out   sentence
   ...: 
   ...: Bonjour.
   ...: """

In [5]: spaces = spacestokenizer.tokenize(string)

In [6]: words = wordtokenizer.tokenize(string)

In [7]: print ''.join([s+w  for s, w in zip(spaces, words)])


This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.

This   is    a    spaced   out   sentence

Bonjour.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM