简体   繁体   English

如何正确连接numpy的2-d ndarray中的字符串元素?

[英]How to properly concat string elements in 2-d ndarray in numpy?

In below code, I expect arr2 return same list of string as lst2 , but it doesn't. 在下面的代码中,我期望arr2返回与lst2相同的字符串列表,但事实并非如此。 Why lst2 and arr2 are different? 为什么lst2arr2不同? Is there any NumPythonic way to make arr2 return same list of string as lst2 ? 是否有任何NumPythonic方法使arr2返回与lst2相同的字符串列表?

code: 码:

import numpy as np

lst = [['MI', '', 'P'], 
       ['B', 'N', 'SUFS'],
       ['KOS', 'XJRXA', 'JJHW'],
       ['ARI', 'TPKI', ''],
       ['VR', 'EYR', '']]

arr = np.array(lst)

arr2 = np.apply_along_axis(lambda x: "".join(x), 1, arr)
lst2 = list(map(lambda x: "".join(x), lst))

print('lst:', lst)
print('arr:', arr.tolist())
print('lst2:', lst2)
print('arr2:', arr2.tolist())

output: 输出:

lst: [['MI', '', 'P'], ['B', 'N', 'SUFS'], ['KOS', 'XJRXA', 'JJHW'], ['ARI', 'TPKI', ''], ['VR', 'EYR', '']]
arr: [['MI', '', 'P'], ['B', 'N', 'SUFS'], ['KOS', 'XJRXA', 'JJHW'], ['ARI', 'TPKI', ''], ['VR', 'EYR', '']]
lst2: ['MIP', 'BNSUFS', 'KOSXJRXAJJHW', 'ARITPKI', 'VREYR']
arr2: ['MIP', 'BNS', 'KOS', 'ARI', 'VRE']

Pandas will do it easily: 熊猫可以轻松做到:

pd.DataFrame(arr).sum(axis=1)

The reason np.apply_along_axis() gives you trouble is that it infers the length of the strings in the result from the first row. np.apply_along_axis()给您带来麻烦的原因是,它从第一行推断出结果中字符串的长度。 Since you end up with MIP in the first row, all the rows have a capacity of 3 characters which is not what you want. 由于您在第一行中以MIP结尾,因此所有行的容量均为3个字符,这不是您想要的。

There is a NumPy bug report for apply_along_axis() with more information: https://github.com/numpy/numpy/issues/8352 有适用于apply_along_axis()的NumPy错误报告, apply_along_axis()包含更多信息: https : //github.com/numpy/numpy/issues/8352

Thanks. 谢谢。 I found the answer from https://github.com/numpy/numpy/issues/8352#issuecomment-488133970 . 我从https://github.com/numpy/numpy/issues/8352#issuecomment-488133970找到了答案。

import numpy as np

lst = [['MI', '', 'P'], ['B', 'N', 'SUFS'], ['KOS', 'XJRXA', 'JJHW'], ['ARI', 'TPKI', ''], ['VR', 'EYR', '']]
arr = np.array(lst)

arr2 = np.apply_along_axis(lambda x: np.asarray("".join(x), dtype=object), 1, arr)
# https://github.com/numpy/numpy/issues/8352#issuecomment-488133970
lst2 = list(map(lambda x: "".join(x), lst))

print('lst:', lst)
print('arr:', arr.tolist())
print('lst2:', lst2)
print('arr2:', arr2.tolist())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM