简体   繁体   中英

Row-wise string concatenation in Pandas

I'm trying to prepare some Pandas Dataframes for output to (non tabular) ascii files. As part of this process, I'm looking to concatenate each row of some dataframes containing numeric data into a Pandas Series of tab separated strings.

At the moment, my code for doing this is something like this:

import pandas as pd
import numpy as np

demo_input = pd.DataFrame(np.random.random((1000000, 10)))

sconcat = lambda a: ['    '.join(map(str, r)) for r in a]

demo_output = pd.Series(sconcat(demo_input.values))

For large inputs this is proving very slow, especially in comparison to how fast other Pandas processes run. Is there a faster way to achieve the same output using built in Pandas methods?

Edit: It's the string conversion that's the bottleneck. Is there any way to leverage the C-based string conversion that occurs when using DataFrame.to_csv ?

The part it seems to take more time is converting the floats to string. Afterwards, the way I would do it is as follows:

demo_input = demo_input.astype(str)
sep = "    "
concatenation = ""
for column in demo_input.columns: # This works fast
    concatenation += demo_input[column] + sep

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM