简体   繁体   English

特定的字符串切片

[英]Specific string slicing

I have a large string array which i store as an nparray named np_base: np.shape(np_base) Out[32]: (65000000, 1) 我有一个很大的字符串数组,我将其存储为名为np_base的nparray: np.shape(np_base) Out[32]: (65000000, 1)

what i intend to do is to vertically slice the array in order to decompose it into multiple columns that i'll store later in an independant way, so i tried to loop over the row indexes and to append: 我打算做的是垂直分割数组,以便将其分解为多列,稍后将以独立的方式存储,因此我尝试遍历行索引并追加:

for i in range(65000000): INCDN.append(np.base[i, 0][0:5]) but this trhows out a memory error. for i in range(65000000): INCDN.append(np.base[i, 0][0:5])但这会导致内存错误。

Could anybody please help me out with this issue, i've been searching for days for an alternative way to slice the string array. 有人可以帮我解决这个问题吗,我一直在寻找几天寻找切片字符串数组的另一种方法。

Thanks, 谢谢,

There are many ways to apply a function to a numpy array one of which is the following: 将函数应用于numpy数组的方法有很多,其中一种是以下方法:

np_truncated = np.vectorize(lambda x: x[:5])(np_base)

Your approach with iterativly appending a list is usally the least perfomed solution in most contexts. 在大多数情况下,迭代添加列表的方法通常是最不可行的解决方案。


Alternatively, if you intent to work with many columns, you might want to use pandas . 另外,如果您打算使用许多列,则可能要使用pandas

import pandas as pd    
df = pd.DataFrame(np_base, columns=["Raw"])
truncated = df.Raw.str.slice(0,5)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM