I have a Dataframe like this
pd.DataFrame([(1,'a','i',[1,2,3],['a','b','c']),(2,'b','i',[4,5],['d','e','f']),(3,'a','j',[7,8,9],['g','h'])])
Output:
0 1 2 3 4
0 1 a i [1, 2, 3] [a, b, c]
1 2 b i [4, 5] [d, e, f]
2 3 a j [7, 8, 9] [g, h]
I want to explode columns 3,4 matching their indices and preserving the rest of the columns like this. I go through this question but the answer is to create a new dataframe and defining all columns again which is memory inefficient (I have 18L rows and 19 columns)
0 1 2 3 4
0 1 a i 1 a
1 1 a i 2 b
2 1 a i 3 c
3 2 b i 4 d
4 2 b i 5 e
5 2 b i NaN f
6 3 c j 7 g
7 3 c j 8 h
8 3 c j 9 NaN
Update : Forgot to mention for missing indices it should be NaN for other
Another solution:
df_out = df.explode(3)
df_out[4] = df[4].explode()
print(df_out)
Prints:
0 1 2 3 4
0 1 a i 1 a
0 1 a i 2 b
0 1 a i 3 c
1 2 b i 4 d
1 2 b i 5 e
1 2 b i 6 f
2 3 a j 7 g
2 3 a j 8 h
EDIT: To handle uneven cases:
df = pd.DataFrame(
[
(1, "a", "i", [1, 2, 3], ["a", "b", "c"]),
(2, "b", "i", [4, 5], ["d", "e", "f"]),
(3, "a", "j", [7, 8, 9], ["g", "h"]),
]
)
def fn(x):
if len(x[3]) < len(x[4]):
x[3].extend([np.nan] * (len(x[4]) - len(x[3])))
elif len(x[3]) > len(x[4]):
x[4].extend([np.nan] * (len(x[3]) - len(x[4])))
return x
# "even-out" the lists:
df = df.apply(fn, axis=1)
# explode them:
df_out = df.explode(3)
df_out[4] = df[4].explode()
print(df_out)
Prints:
0 1 2 3 4
0 1 a i 1 a
0 1 a i 2 b
0 1 a i 3 c
1 2 b i 4 d
1 2 b i 5 e
1 2 b i NaN f
2 3 a j 7 g
2 3 a j 8 h
2 3 a j 9 NaN
You can use pd.Series.explode
:
df = df.apply(pd.Series.explode).reset_index(drop=True)
output:
0 1 2 3 4
0 1 a i 1 a
1 1 a i 2 b
2 1 a i 3 c
3 2 b i 4 d
4 2 b i 5 e
5 2 b i 6 f
6 3 a j 7 g
7 3 a j 8 h
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.