Creating pandas dataframe from a list of strings

Question

I have the foll. list:

list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']

How can I convert it into a pandas dataframe?

I can start like this:

df = pd.DataFrame(columns=list_vals[0].split())

Is there a way to populate rest of dataframe?

Answer 1

You could use io.StringIO to feed a string into read_csv :

In [23]: pd.read_csv(io.StringIO('\n'.join(list_vals)), delim_whitespace=True)
Out[23]: 
   col_a  col_B  col_C
0   12.0   34.0   10.0
1   15.0  111.0   23.0

This has the advantage that it automatically does the type interpretation that pandas would do if you were reading an ordinary csv-- the columns are floats:

In [24]: _.dtypes
Out[24]: 
col_a    float64
col_B    float64
col_C    float64
dtype: object

While you could just feed your list into the DataFrame constructor directly, everything would stay strings:

In [21]: pd.DataFrame(columns=list_vals[0].split(), 
                      data=[row.split() for row in list_vals[1:]])
Out[21]: 
  col_a  col_B col_C
0  12.0   34.0  10.0
1  15.0  111.0    23

In [22]: _.dtypes
Out[22]: 
col_a    object
col_B    object
col_C    object
dtype: object

We could add dtype=float to fix this, of course, but we might have mixed types, which the read_csv approach would handle in the usual way and here we'd have to do manually.

Answer 2

You can do it by converting to your data to dict, eg:

>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))})
   col_B col_C col_a
0   34.0  10.0  12.0
1  111.0    23  15.0

Or with your original order:

>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))},
...              columns=list_vals[0].split())
  col_a  col_B col_C
0  12.0   34.0  10.0
1  15.0  111.0    23

Answer 3

You can read this as a numpy structured array , then pass it over to pandas. This is useful if you also need to work with numpy, and have the data types defined before reading (otherwise numpy is a step back to work with compared to pandas).

import numpy as np
import pandas as pd

list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']

# Gather names from first line, assume all column types are 'd' (i.e. float)
list_dtype = np.dtype([(name, 'd') for name in list_vals[0].split()])

# Create a numpy structured array
ar = np.fromiter((tuple(x.split()) for x in list_vals[1:]), dtype=list_dtype)

# Now convert it to a pandas DataFrame
dat = pd.DataFrame(ar)

Creating pandas dataframe from a list of strings

Question

3 answers

solution1
10 ACCPTED 2017-02-11 03:54:40

solution2
1 2017-02-11 03:26:52

solution3
1 2018-11-26 02:46:23

Creating pandas dataframe from a list of strings

Question

3 answers

solution1 10 ACCPTED 2017-02-11 03:54:40

solution2 1 2017-02-11 03:26:52

solution3 1 2018-11-26 02:46:23

solution1
10 ACCPTED 2017-02-11 03:54:40

solution2
1 2017-02-11 03:26:52

solution3
1 2018-11-26 02:46:23