Python Numpy array2string 性能

Question

I am working on a python script to set up an input file for a solid mechanics simulation software.我正在使用 python 脚本为固体力学模拟软件设置输入文件。 The part of the script I'm struggling with is where I format nodal data (node numbers and the corresponding 3D coordinates) from a numpy array to string format, with one node's data per line.我正在努力处理的脚本部分是我将节点数据（节点编号和相应的 3D 坐标）从 numpy 数组格式化为字符串格式，每行一个节点的数据。 I've been working on improving the run time of the script, and this is by far the slowest portion of the whole thing.我一直致力于改进脚本的运行时间，这是迄今为止整个过程中最慢的部分。 I originally used np.array2string, but found that it gets pretty slow above about 100,000 nodes.我最初使用 np.array2string，但发现它在大约 100,000 个节点以上变得非常慢。

The numpy array with nodal data is called 'nodes', and is an Nx4 array, where N is the number of nodes in the model and can vary from run to run.带有节点数据的 numpy 数组称为“节点”，是一个 Nx4 数组，其中 N 是模型中的节点数，并且会因运行而异。 There is some additional formatting of the data in 'nodeString' that takes place later in the code to remove extraneous brackets, parentheses and commas, but that is relatively quick and pretty much the same between all methods below.稍后在代码中对“nodeString”中的数据进行了一些额外的格式化，以删除无关的方括号、括号和逗号，但这相对较快，并且在以下所有方法之间几乎相同。

I've tried a couple different settings for the parameters of array2string:我为 array2string 的参数尝试了几种不同的设置：

np.set_printoptions(threshold=np.inf)
nodeString = np.array2string(nodes, precision=4, suppress_small=True, separator=',') # original syntax
nodeString = np.array2string(nodes, suppress_small=True, separator=',')
nodeString = np.array2string(nodes, precision=4, suppress_small=False, separator=',')

I've tried array_str instead:我试过 array_str ：

np.set_printoptions(threshold=np.inf)
nodeString = np.array_str(nodes, precision=4, suppress_small=True)

I've also tried just writing the numpy array to a text file and opening it back up:我也尝试过将 numpy 数组写入文本文件并重新打开它：

np.set_printoptions(threshold=np.inf)
fmt = '%s', '%.4f', '%.4f', '%.4f'
np.savetxt('temp.txt', nodes, delimiter=',', fmt=fmt)
with open('temp.txt', 'r') as file:
   nodeString = file.read()

Comparison of processing time vs number of nodes for different numpy array to string techniques不同 numpy 数组到字符串技术的处理时间与节点数的比较

(The run time reported in the figure above is in seconds.) By far, the fastest technique I've found is to save the data and then read it back in. I'm really surprised by this, and I wonder if I'm doing something wrong with the native numpy functions like array2string that are negatively impacting their performance. （上图中报告的运行时间以秒为单位。）到目前为止，我发现最快的技术是保存数据然后将其读回。我对此感到非常惊讶，我想知道我是否“我对像 array2string 这样的原生 numpy 函数做了一些错误，这对它们的性能产生了负面影响。 I'm a Mechanical Engineer, and I've been told we code by brute force rather than by elegance, so if someone has a better way of doing what I'm trying to do, or an explanation why it's faster to write and read than just to reformat, I'd appreciated any insight.我是一名机械工程师，有人告诉我我们编码是靠蛮力而不是靠优雅，所以如果有人有更好的方法来做我想做的事情，或者解释为什么写和读更快不仅仅是重新格式化，我很感激任何见解。 Thanks!谢谢！

Answer 1

Instead of writing and reading from a file, read and write to a StringIO object:不是写入和读取文件，而是读取和写入 StringIO 对象：

from io import StringIO
sb = StringIO()
np.savetxt(sb, nodes, delimiter=',', fmt=fmt)
nodeString = sb.getvalue()

I believe this will save you time by avoiding reading and writing from the harddrive.我相信这会避免从硬盘读取和写入，从而节省您的时间。 Rather it keeps everything in memory.相反，它将所有内容保存在内存中。

Python Numpy array2string 性能

问题描述

1 个解决方案

解决方案1
1 2022-12-21 19:53:49

Python Numpy array2string 性能

问题描述

1 个解决方案

解决方案1 1 2022-12-21 19:53:49

解决方案1
1 2022-12-21 19:53:49