I have a Pandas data frame with a column that contains a list and a value: ([z, z, z, z, m, ., c, l, u, b, .], 0.0)
How do I split this column into two columns that I add to the data frame? The output I want: one column will contain the list, the other column will contain the value. For example:
[z, z, z, z, m, ., c, l, u, b, .]
and 0.0
I have tried str.split(...,expand=True,)
but the output is just a column of NaN
. I can't use the comma delimiter and ],
both produce one column of NaN
rather than a column of lists and a column of values.
Here's 4 rows of the column of my Pandas data frame that I'm trying to manipulate.
X['set']
1 ([z, z, z, z, m, ., c, l, u, b, .], 0.0)
2 ([z, z, z, z, g, ., c, l, u, b, .], 0.0)
3 ([z, z, z, z, cy, s, ., l, o, a, n, .], 0.0)
4 ([z, z, z, x, c, ., u, s, .], 0.0)
I was able to figure it out based on deduction using the answers of other users.
pd.DataFrame(X['set'].tolist(), index=df.index)
Related post: how to split column of tuples in pandas dataframe?
can you try making the delimiter ],
?
You just need a bit of string gymnastics:
def separate(x):
closing_bracket_index = x.index(']')
list_vals = x[:closing_bracket_index+1]
val = x[closing_bracket_index+3:]
return pd.Series([list_vals, val], index=['list', 'val'])
X['set'].apply(separate)
Hope this works
import numpy as np
import pandas as pd
a = (['g','f'],0.0)
b = (['d','e'],0.1)
df = pd.DataFrame({'col':[a,b]})
df
Out[1]:
col
0 ([g, f], 0.0)
1 ([d, e], 0.1)
def split_val(col):
list_val = col[0]
value = col[1]
return pd.Series([list_val, value], index=['list', 'val'])
df[['list_val','value']] = df['col'].apply(split_val)
df
Out[2]:
col list_val value
0 [[g, f], 0.0] [g, f] 0.0
1 [[d, e], 0.1] [d, e] 0.1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.