简体   繁体   English

Pandas 将列表拆分为多行

[英]Pandas Split lists into multiple rows

I have a Dataframe like this我有一个像这样的 Dataframe

pd.DataFrame([(1,'a','i',[1,2,3],['a','b','c']),(2,'b','i',[4,5],['d','e','f']),(3,'a','j',[7,8,9],['g','h'])])

Output: Output:

    0   1   2   3           4
0   1   a   i   [1, 2, 3]   [a, b, c]
1   2   b   i   [4, 5]      [d, e, f]
2   3   a   j   [7, 8, 9]   [g, h]

I want to explode columns 3,4 matching their indices and preserving the rest of the columns like this.我想分解与其索引匹配的第 3,4 列,并保留这样的列的 rest。 I go through this question but the answer is to create a new dataframe and defining all columns again which is memory inefficient (I have 18L rows and 19 columns)我通过这个问题go 但答案是创建一个新的 dataframe 并再次定义所有列 memory 19 列效率低(我有 18L 行和

    0  1  2  3   4
0   1  a  i  1   a
1   1  a  i  2   b
2   1  a  i  3   c
3   2  b  i  4   d
4   2  b  i  5   e
5   2  b  i  NaN f
6   3  c  j  7   g
7   3  c  j  8   h
8   3  c  j  9   NaN

Update : Forgot to mention for missing indices it should be NaN for other更新:忘了提到缺少索引,其他索引应该是 NaN

Another solution:另一种解决方案:

df_out = df.explode(3)
df_out[4] = df[4].explode()
print(df_out)

Prints:印刷:

   0  1  2  3  4
0  1  a  i  1  a
0  1  a  i  2  b
0  1  a  i  3  c
1  2  b  i  4  d
1  2  b  i  5  e
1  2  b  i  6  f
2  3  a  j  7  g
2  3  a  j  8  h

EDIT: To handle uneven cases:编辑:处理不均匀的情况:

df = pd.DataFrame(
    [
        (1, "a", "i", [1, 2, 3], ["a", "b", "c"]),
        (2, "b", "i", [4, 5], ["d", "e", "f"]),
        (3, "a", "j", [7, 8, 9], ["g", "h"]),
    ]
)


def fn(x):
    if len(x[3]) < len(x[4]):
        x[3].extend([np.nan] * (len(x[4]) - len(x[3])))
    elif len(x[3]) > len(x[4]):
        x[4].extend([np.nan] * (len(x[3]) - len(x[4])))
    return x


# "even-out" the lists:
df = df.apply(fn, axis=1)

# explode them:
df_out = df.explode(3)
df_out[4] = df[4].explode()
print(df_out)

Prints:印刷:

   0  1  2    3    4
0  1  a  i    1    a
0  1  a  i    2    b
0  1  a  i    3    c
1  2  b  i    4    d
1  2  b  i    5    e
1  2  b  i  NaN    f
2  3  a  j    7    g
2  3  a  j    8    h
2  3  a  j    9  NaN

You can use pd.Series.explode :您可以使用pd.Series.explode

df = df.apply(pd.Series.explode).reset_index(drop=True)

output: output:

   0  1  2  3  4
0  1  a  i  1  a
1  1  a  i  2  b
2  1  a  i  3  c
3  2  b  i  4  d
4  2  b  i  5  e
5  2  b  i  6  f
6  3  a  j  7  g
7  3  a  j  8  h

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM