操纵pandas数据帧以显示所需的输出

Question

I have the following DataFrame structure: 我有以下DataFrame结构：

profile_id  user   birthday
123, 124    test1  day1
131, 132    test2  day2

What I need to display is: 我需要显示的是：

profile_id  user   birthday
123        test1   day1 
124        test1   day1
131        test2   day2
132        test2   day2

In the profile_id column I have a couple of ids separated with a comma, and I need to loop through each id. 在profile_id列中，我有几个用逗号分隔的id，我需要遍历每个id。

Answer 1

Here's one way to do 这是一种方法

In [1127]: dfs = (df.profile_id.str.split(', ', expand=True).stack()
                   .reset_index(name='profile_id'))

In [1128]: df.loc[dfs.level_0].assign(profile_id=dfs.profile_id)
Out[1128]:
  profile_id   user birthday
0        123  test1     day1
0        123  test1     day1
1        124  test2     day2
1        124  test2     day2

Answer 2

You can also do this with a combination of concat() and .melt() : 您还可以使用concat()和.melt()的组合来执行此操作：

>>> pd.concat((
...            df['profile_id'].str.split(', ', expand=True),
...            df.drop('profile_id', axis=1)), axis=1)\
...     .melt(id_vars=['user', 'birthday'], value_name='profile_id')\
...     .drop('variable', axis=1)
    user birthday profile_id
0  test1     day1        123
1  test2     day2        131
2  test1     day1        124
3  test2     day2        132

Answer 3

One-liner 一衬垫

df.loc[df.index.repeat(df.profile_id.str.count(', ') + 1)].assign(
    profile_id=', '.join(df.profile_id).split(', '))

  profile_id   user birthday
0        123  test1     day1
0        124  test1     day1
1        131  test2     day2
1        132  test2     day2

Broken down 坏了

sep = ', '
idx = df.index.repeat(df.profile_id.str.count(sep) + 1)
new = sep.join(df.profile_id).split(sep)
df.loc[idx].assign(profile_id=new)

  profile_id   user birthday
0        123  test1     day1
0        124  test1     day1
1        131  test2     day2
1        132  test2     day2

Numpy slice instead of `loc` Numpy切片而不是`loc`

also get a fresh index 也获得了新的指数

sep = ', '
col = 'profile_id'
p = df[col]
i = np.arange(len(df)).repeat(p.str.count(sep) + 1)
pd.DataFrame({
    col: sep.join(p).split(sep),
    **{c: df[c].values[i] for c in df if c != col}
}, columns=df.columns)

  profile_id   user birthday
0        123  test1     day1
1        124  test1     day1
2        131  test2     day2
3        132  test2     day2

Answer 4

df.profile_id.str.split(",",expand=True).set_index(a.user).stack().reset_index(level=1, drop=True).reset_index().rename(columns={0:"profile_id"})

Answer 5

Using extractall and join : 使用extractall和join ：

df.join(
    df.pop('profile_id').str.extractall(r'(\d+)').reset_index(1, drop=True)
).rename(columns={0: 'profile_id'})

    user birthday profile_id
0  test1     day1        123
0  test1     day1        124
1  test2     day2        131
1  test2     day2        132

操纵pandas数据帧以显示所需的输出

问题描述

5 个解决方案

解决方案1
3 2018-09-05 15:03:21

解决方案2
3 2018-09-05 15:14:08

解决方案3
2 已采纳 2018-09-05 15:03:34

One-liner 一衬垫

Broken down 坏了

Numpy slice instead of `loc` Numpy切片而不是`loc`

解决方案4
2 2018-09-05 15:06:14

解决方案5
2 2018-09-05 15:19:03

操纵pandas数据帧以显示所需的输出

问题描述

5 个解决方案

解决方案1 3 2018-09-05 15:03:21

解决方案2 3 2018-09-05 15:14:08

解决方案3 2 已采纳 2018-09-05 15:03:34

One-liner 一衬垫

Broken down 坏了

Numpy slice instead of loc Numpy切片而不是loc

解决方案4 2 2018-09-05 15:06:14

解决方案5 2 2018-09-05 15:19:03

解决方案1
3 2018-09-05 15:03:21

解决方案2
3 2018-09-05 15:14:08

解决方案3
2 已采纳 2018-09-05 15:03:34

Numpy slice instead of `loc` Numpy切片而不是`loc`

解决方案4
2 2018-09-05 15:06:14

解决方案5
2 2018-09-05 15:19:03