简体   繁体   English

熊猫中的按行字符串连接

[英]Row-wise string concatenation in Pandas

I'm trying to prepare some Pandas Dataframes for output to (non tabular) ascii files. 我正在尝试准备一些熊猫数据帧,以输出到(非表格格式)ascii文件。 As part of this process, I'm looking to concatenate each row of some dataframes containing numeric data into a Pandas Series of tab separated strings. 作为此过程的一部分,我希望将包含数值数据的某些数据框的每一行连接到由制表符分隔的字符串的Pandas系列中。

At the moment, my code for doing this is something like this: 目前,我执行此操作的代码如下所示:

import pandas as pd
import numpy as np

demo_input = pd.DataFrame(np.random.random((1000000, 10)))

sconcat = lambda a: ['    '.join(map(str, r)) for r in a]

demo_output = pd.Series(sconcat(demo_input.values))

For large inputs this is proving very slow, especially in comparison to how fast other Pandas processes run. 对于大量输入,这证明非常慢,尤其是与其他Pandas进程的运行速度相比。 Is there a faster way to achieve the same output using built in Pandas methods? 是否有使用内置Pandas方法更快地实现相同输出的方法?

Edit: It's the string conversion that's the bottleneck. 编辑:这是瓶颈的字符串转换。 Is there any way to leverage the C-based string conversion that occurs when using DataFrame.to_csv ? 有什么方法可以利用使用DataFrame.to_csv时发生的基于C的字符串转换吗?

The part it seems to take more time is converting the floats to string. 似乎需要更多时间的部分是将浮点数转换为字符串。 Afterwards, the way I would do it is as follows: 之后,我的操作方式如下:

demo_input = demo_input.astype(str)
sep = "    "
concatenation = ""
for column in demo_input.columns: # This works fast
    concatenation += demo_input[column] + sep

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM