I have the foll. list:
list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']
How can I convert it into a pandas dataframe?
I can start like this:
df = pd.DataFrame(columns=list_vals[0].split())
Is there a way to populate rest of dataframe?
You could use io.StringIO
to feed a string into read_csv
:
In [23]: pd.read_csv(io.StringIO('\n'.join(list_vals)), delim_whitespace=True)
Out[23]:
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23.0
This has the advantage that it automatically does the type interpretation that pandas would do if you were reading an ordinary csv-- the columns are floats:
In [24]: _.dtypes
Out[24]:
col_a float64
col_B float64
col_C float64
dtype: object
While you could just feed your list into the DataFrame constructor directly, everything would stay strings:
In [21]: pd.DataFrame(columns=list_vals[0].split(),
data=[row.split() for row in list_vals[1:]])
Out[21]:
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23
In [22]: _.dtypes
Out[22]:
col_a object
col_B object
col_C object
dtype: object
We could add dtype=float
to fix this, of course, but we might have mixed types, which the read_csv
approach would handle in the usual way and here we'd have to do manually.
You can do it by converting to your data to dict, eg:
>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))})
col_B col_C col_a
0 34.0 10.0 12.0
1 111.0 23 15.0
Or with your original order:
>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))},
... columns=list_vals[0].split())
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23
You can read this as a numpy structured array , then pass it over to pandas. This is useful if you also need to work with numpy, and have the data types defined before reading (otherwise numpy is a step back to work with compared to pandas).
import numpy as np
import pandas as pd
list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']
# Gather names from first line, assume all column types are 'd' (i.e. float)
list_dtype = np.dtype([(name, 'd') for name in list_vals[0].split()])
# Create a numpy structured array
ar = np.fromiter((tuple(x.split()) for x in list_vals[1:]), dtype=list_dtype)
# Now convert it to a pandas DataFrame
dat = pd.DataFrame(ar)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.