简体   繁体   English

Python 中的可变字符串

[英]Mutable strings in Python

Please, do you know of a Python library which provides mutable strings?请问,您知道提供可变字符串的 Python 库吗? Google returned surprisingly few results.谷歌返回的结果出奇地少。 The only usable library I found is http://code.google.com/p/gapbuffer/ which is in C but I would prefer it to be written in pure Python.我发现的唯一可用库是http://code.google.com/p/gapbuffer/ ,它是用 C 编写的,但我更喜欢用纯 Python 编写它。

Edit: Thanks for the responses but I'm after an efficient library.编辑:感谢您的回复,但我追求的是一个高效的图书馆。 That is, ''.join(list) might work but I was hoping for something more optimized.也就是说, ''.join(list)可能会起作用,但我希望有更优化的东西。 Also, it has to support the usual stuff regular strings do, like regex and unicode.此外,它必须支持常规字符串所做的常见事情,例如正则表达式和 unicode。

在 Python 中可变序列类型是bytearray看到 这个链接

This will allow you to efficiently change characters in a string.这将允许您有效地更改字符串中的字符。 Although you can't change the string length.虽然你不能改变字符串长度。

>>> import ctypes

>>> a = 'abcdefghijklmn'
>>> mutable = ctypes.create_string_buffer(a)
>>> mutable[5:10] = ''.join( reversed(list(mutable[5:10].upper())) )
>>> a = mutable.value
>>> print `a, type(a)`
('abcdeJIHGFklmn', <type 'str'>)
class MutableString(object):
    def __init__(self, data):
        self.data = list(data)
    def __repr__(self):
        return "".join(self.data)
    def __setitem__(self, index, value):
        self.data[index] = value
    def __getitem__(self, index):
        if type(index) == slice:
            return "".join(self.data[index])
        return self.data[index]
    def __delitem__(self, index):
        del self.data[index]
    def __add__(self, other):
        self.data.extend(list(other))
    def __len__(self):
        return len(self.data)

... and so on, and so forth. ... 等等等等。

You could also subclass StringIO, buffer, or bytearray.您还可以子类化 StringIO、缓冲区或字节数组。

How about simply sub-classing list (the prime example for mutability in Python)?简单的子类list如何(Python 中可变性的主要示例)?

class CharList(list):

    def __init__(self, s):
        list.__init__(self, s)

    @property
    def list(self):
        return list(self)

    @property
    def string(self):
        return "".join(self)

    def __setitem__(self, key, value):
        if isinstance(key, int) and len(value) != 1:
            cls = type(self).__name__
            raise ValueError("attempt to assign sequence of size {} to {} item of size 1".format(len(value), cls))
        super(CharList, self).__setitem__(key, value)

    def __str__(self):
        return self.string

    def __repr__(self):
        cls = type(self).__name__
        return "{}(\'{}\')".format(cls, self.string)

This only joins the list back to a string if you want to print it or actively ask for the string representation.如果您想打印它或主动要求字符串表示,这只会将列表连接回字符串。 Mutating and extending are trivial, and the user knows how to do it already since it's just a list.变异和扩展是微不足道的,用户已经知道如何去做,因为它只是一个列表。

Example usage:用法示例:

s = "te_st"
c = CharList(s)
c[1:3] = "oa"
c += "er"
print c # prints "toaster"
print c.list # prints ['t', 'o', 'a', 's', 't', 'e', 'r']

The following is fixed, see update below.以下已修复,请参阅下面的更新。

There's one (solvable) caveat: There's no check (yet) that each element is indeed a character.有一个(可解决的)警告:(还)没有检查每个元素确实是一个字符。 It will at least fail printing for everything but strings.除了字符串之外,它至少会无法打印所有内容。 However, those can be joined and may cause weird situations like this: [see code example below]但是,这些可以加入并可能导致这样的奇怪情况:[见下面的代码示例]

With the custom __setitem__ , assigning a string of length != 1 to a CharList item will raise a ValueError .使用自定义__setitem__ ,将长度为 != 1 的字符串分配给 CharList 项将引发ValueError Everything else can still be freely assigned but will raise a TypeError: sequence item n: expected string, X found when printing, due to the string.join() operation.其他所有内容仍然可以自由分配,但由于string.join()操作,会引发TypeError: sequence item n: expected string, X found string.join() If that's not good enough, further checks can be added easily (potentially also to __setslice__ or by switching the base class to collections.Sequence (performance might be different?!), cf. here )如果这还不够好,可以轻松添加进一步的检查(也可能添加到__setslice__或通过将基类切换到collections.Sequence (性能可能不同?!),参见此处

s = "test"
c = CharList(s)
c[1] = "oa"
# with custom __setitem__ a ValueError is raised here!
# without custom __setitem__, we could go on:
c += "er"
print c # prints "toaster"
# this looks right until here, but:
print c.list # prints ['t', 'oa', 's', 't', 'e', 'r']

Efficient mutable strings in Python are arrays . Python 中有效的可变字符串是数组 PY3 Example for unicode string using array.array from standard library:使用标准库中的array.array unicode 字符串的 PY3 示例:

>>> ua = array.array('u', 'teststring12')
>>> ua[-2:] = array.array('u', '345')
>>> ua
array('u', 'teststring345')
>>> re.search('string.*', ua.tounicode()).group()
'string345'

bytearray is predefined for bytes and is more automatic regarding conversion and compatibility. bytearray是为字节预定义的,并且在转换和兼容性方面更加自动化。

You can also consider memoryview / buffer , numpy arrays, mmap and multiprocessing.shared_memory for certain cases.在某些情况下,您还可以考虑memoryview / buffernumpy数组、 mmapmultiprocessing.shared_memory

The FIFOStr package in pypi supports pattern matching and mutable strings. pypi 中的FIFOStr包支持模式匹配和可变字符串。 This may or may not be exactly what is wanted but was created as part of a pattern parser for a serial port (the chars are added one char at a time from left or right - see docs).这可能是也可能不是真正想要的,但它是作为串行端口模式解析器的一部分创建的(字符从左侧或右侧一次添加一个字符 - 请参阅文档)。 It is derived from deque.它源自 deque。

from fifostr import FIFOStr

myString = FIFOStr("this is a test")
myString.head(4) == "this"  #true
myString[2] = 'u'
myString.head(4) == "thus"  #true

(full disclosure I'm the author of FIFOstr) (完全披露我是 FIFOstr 的作者)

Just do this就这样做
string = "big"
string = list(string)
string[0] = string[0].upper()
string = "".join(string)
print(string)

'''OUTPUT''' '''输出'''
> Big > 大

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM