简体   繁体   English

numpy数组中的字符串切片

[英]String slicing in numpy array

Say we have an numpy.ndarray with numpy.str_ elements. 假设我们有一个numpy.ndarraynumpy.str_元素。 For example, below arr is the numpy.ndarray with two numpy.str_ elements like this: 例如,在arr下面是带有两个numpy.str_元素的numpy.ndarray ,如下所示:

arr = ['12345"""ABCDEFG'  '1A2B3C"""']

Trying to perform string slicing on each numpy element. 尝试对每个numpy元素执行字符串切片。

For example, how can we slice the first element '12345"""ABCDEFG' so that we replace its 10 last characters with the string REPL , ie 例如,如何分割第一个元素'12345"""ABCDEFG'以便将其最后10个字符替换为字符串REPL ,即

arr = ['12345REPL'  '1A2B3C"""']

Also, is it possible to perform string substitutions, eg substitute all characters after a specific symbol? 另外,是否可以执行字符串替换,例如在特定符号后替换所有字符?

In python, strings are immutable. 在python中,字符串是不可变的。 Also, in NumPy, array scalars are immutable; 同样,在NumPy中,数组标量是不可变的; your string is therefore immutable. 因此,您的字符串是不可变的。

What you would want to do in order to slice is to treat your string like a list and access the elements. 为了进行切片,您要做的是将字符串像一个列表一样对待并访问元素。

Say we had a string where we wanted to slice at the 3rd letter, excluding the third letter: 假设我们有一个要在第三个字母处切出的字符串,不包括第三个字母:

my_str = 'purple'
sliced_str = my_str[:3]

Now that we have the part of the string, say we wanted to substitute z's for every letter following where we sliced. 现在我们有了字符串的一部分,说我们想用z代替切片后的每个字母。 We would have to work with the new string that pulled out the letters we wanted, and create an additional string with the desired string that we want to create: 我们将不得不使用新的字符串来提取我们想要的字母,并使用我们要创建的期望的字符串创建另一个字符串:

# say I want to replace the end of 'my_str', from where we sliced, with a string named 's'
s = 'dandylion'
new_string = sliced_str + s     # returns 'pudandylion'

Because string types are immutable, you have to store elements you want to keep, then combine the stored elements with the elements you would like to add in a new variable. 由于字符串类型是不可变的,因此必须存储要保留的元素,然后将存储的元素与要添加到新变量中的元素合并。

Strings are immutable, so you should either create slices and manually recombine or use regular expressions. 字符串是不可变的,因此您应该创建切片并手动重新组合或使用正则表达式。 For example, to replace the last 10 characters of the first element in your array, arr , you could do: 例如,要替换数组arr第一个元素的最后10个字符,您可以执行以下操作:

import numpy as np
import re

arr = np.array(['12345"""ABCDEFG', '1A2B3C"""'])
arr[0] = re.sub(arr[0][-10:], 'REPL', arr[0])

print(arr)
#['12345REPL' '1A2B3C"""']

If you want to replace all characters after a specific character you could use a regular expression or find the index of that character in the string and use that as the slicing index. 如果要替换特定字符之后的所有字符,则可以使用正则表达式或在字符串中找到该字符的索引,然后将其用作切片索引。

EDIT: Your comment is more about regular expressions than simply Python slicing, but this is how you could replace everything after the triple quote: 编辑:您的评论更多是关于正则表达式,而不是简单的Python切片,但这是在三引号后可以替换所有内容的方法:

re.sub('["]{3}(.+)', 'REPL', arr[0])

This line essentially says, "Find the triple quote and everything after it, but only replace every character after the triple quotes." 该行本质上说:“查找三引号及其后的所有内容,但仅替换三引号后的每个字符。”

np.char has replace function, which applies the corresponding string method to each element of the array: np.char具有replace功能,该功能将相应的字符串方法应用于数组的每个元素:

In [598]: arr = np.array(['12345"""ABCDEFG',  '1A2B3C"""'])
In [599]: np.char.replace(arr,'"""ABCDEFG',"REPL")
Out[599]: 
array(['12345REPL', '1A2B3C"""'], 
      dtype='<U9')

In this particular example it can be made to work, but it isn't nearly as general purpose as re.sub . 在这个特定示例中,它可以工作,但它的通用性不如re.sub Also these char functions are only modestly faster than iterating on the array. 而且,这些char函数仅比在数组上进行迭代要适度地快。 There are some good examples of that in @Divakar's link. @Divakar's链接中有一些很好的例子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM