简体   繁体   中英

Use pandas.read_csv to convert comma seperate string list into dataframe

How can I use Pandas read_csv to convert a big list quickly into a dataframe?

import Pandas as pd

x = '1,2,3,4,5,7,8,9'
df = pd.read_csv(x)

I know that I could split the string by comma -> put into a list -> convert to dataframe, but was wondering was there a way to do this with pd.read_csv that would be faster?

x = '1,2,3,4,5,7,8,9'
df = pd.read_csv(pd.io.common.StringIO(x), header=None)

df

   0  1  2  3  4  5  7  8
0  1  2  3  4  5  7  8  9

Is the best you can do with pd.read_csv


Consider the much larger string

y = '\n'.join([','.join(['0,1,2,3,4,5,6,7,8,9'] * 100)] * 1000)

And compare timing of these two options

%timeit pd.DataFrame([l.split(',') for l in y.split('\n')]).astype(int)
%timeit pd.read_csv(pd.io.common.StringIO(y), header=None)

1 loop, best of 3: 200 ms per loop
10 loops, best of 3: 125 ms per loop

If all we needed to do is split the string, split would be faster. However, one of the things pd.read_csv does for us is parse integers. That extra overhead gets expensive when having to do it after the split.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM