I have a DataFrame with one of the columns containing some sequential data in a form of list or tuple (always the same length), my aim is to split this column into several new columns, ideally updating one of the existing columns.
Here is the minimal example
from pandas import DataFrame, concat
data = DataFrame({"label": [a for a in "abcde"], "x": range(5)})
print(data)
label x
0 a 0
1 b 1
2 c 2
3 d 3
4 e 4
The fictional way, using nonexisting function splittuple would be something like this
data[["x", "x2"]] = data["x"].apply(lambda x: (x, x*2)).splittuple(expand = True)
resulting in
label x x2
0 a 0 0
1 b 1 2
2 c 2 4
3 d 3 6
4 e 4 8
Of course I can do it like this, though the solution is bit cloggy
newdata = DataFrame(data["x"].apply(lambda x: (x, x*2)).tolist(), columns = ["x", "x2"])
data.drop("x", axis = 1, inplace = True)
data = concat((data, newdata), axis = 1)
print(data)
label x x2
0 a 0 0
1 b 1 2
2 c 2 4
3 d 3 6
4 e 4 8
Alternative even more ugly solution
data[["x", "x2"]] =
data["x"].apply(lambda x: "{} {}".format(x, x*2)).str.split(expand = True).astype(int)
Could you suggest more elegant way to do this type of transformation?
It is possible, but not so fast with apply
and Series
:
tup = data["x"].apply(lambda x: (x, x*2))
data[["x", "x2"]] = tup.apply(pd.Series)
print (data)
label x x2
0 a 0 0
1 b 1 2
2 c 2 4
3 d 3 6
4 e 4 8
Faster is use DataFrame
constructor:
tup = data["x"].apply(lambda x: (x, x*2))
data[["x", "x2"]] = pd.DataFrame(tup.values.tolist())
print (data)
label x x2
0 a 0 0
1 b 1 2
2 c 2 4
3 d 3 6
4 e 4 8
Timings :
data = pd.DataFrame({"label": [a for a in "abcde"], "x": range(5)})
data = pd.concat([data]*1000).reset_index(drop=True)
tup = data["x"].apply(lambda x: (x, x*2))
data[["x", "x2"]] = tup.apply(pd.Series)
data[["y", "y2"]] = pd.DataFrame(tup.values.tolist())
print (data)
In [266]: %timeit data[["x", "x2"]] = tup.apply(pd.Series)
1 loop, best of 3: 836 ms per loop
In [267]: %timeit data[["y", "y2"]] = pd.DataFrame(tup.values.tolist())
100 loops, best of 3: 3.1 ms per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.