根据列值重复 pandas DataFrame 中的行

Question

I have the following df:我有以下 df：

code . role    . persons
123 .  Janitor . 3
123 .  Analyst . 2
321 .  Vallet  . 2
321 .  Auditor . 5

The first line means that I have 3 persons with the role Janitors.第一行意味着我有 3 个人扮演看门人的角色。 My problem is that I would need to have one line for each person.我的问题是我需要为每个人安排一行。 My df should look like this:我的 df 应该是这样的：

df:

code . role    . persons
123 .  Janitor . 3
123 .  Janitor . 3
123 .  Janitor . 3
123 .  Analyst . 2
123 .  Analyst . 2
321 .  Vallet  . 2
321 .  Vallet  . 2
321 .  Auditor . 5
321 .  Auditor . 5
321 .  Auditor . 5
321 .  Auditor . 5
321 .  Auditor . 5

How could I do that using pandas?我怎么能用 pandas 做到这一点？

Answer 1

reindex + repeat reindex + repeat

df.reindex(df.index.repeat(df.persons))
Out[951]: 
   code  .     role ..1  persons
0   123  .  Janitor   .        3
0   123  .  Janitor   .        3
0   123  .  Janitor   .        3
1   123  .  Analyst   .        2
1   123  .  Analyst   .        2
2   321  .   Vallet   .        2
2   321  .   Vallet   .        2
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5

PS: you can add .reset_index(drop=True) to get the new index PS：您可以添加.reset_index(drop=True)以获取新索引

Answer 2

Wen's solution is really nice and intuitive. Wen 的解决方案非常好且直观。 Here's an alternative, calling repeat on df.values .这是另一种选择，在df.values上调用repeat 。

df

   code     role  persons
0   123  Janitor        3
1   123  Analyst        2
2   321   Vallet        2
3   321  Auditor        5


pd.DataFrame(df.values.repeat(df.persons, axis=0), columns=df.columns)

   code     role persons
0   123  Janitor       3
1   123  Janitor       3
2   123  Janitor       3
3   123  Analyst       2
4   123  Analyst       2
5   321   Vallet       2
6   321   Vallet       2
7   321  Auditor       5
8   321  Auditor       5
9   321  Auditor       5
10  321  Auditor       5
11  321  Auditor       5

Answer 3

Not enough reputation to comment, but building on @cs95's answer and @lmiguelvargasf's comment, one can preserve dtypes with:没有足够的声誉发表评论，但基于@cs95 的回答和@lmiguelvargasf 的评论，可以使用以下方式保留 dtypes：

pd.DataFrame(
    df.values.repeat(df.persons, axis=0),
    columns=df.columns,
).astype(df.dtypes)

Answer 4

You can apply the Series method repeat :您可以应用 Series 方法repeat ：

df = pd.DataFrame({'col1': [2, 3],
                   'col2': ['a', 'b'],
                   'col3': [20, 30]})

df.apply(lambda x: x.repeat(df['col1']))
# df.apply(pd.Series.repeat, repeats=df['col1'])

or the numpy function repeat :或 numpy function repeat ：

df.apply(np.repeat, repeats=df['col1'])

Output: Output：

   col1 col2  col3
0     2    a    20
0     2    a    20
1     3    b    30
1     3    b    30
1     3    b    30

根据列值重复 pandas DataFrame 中的行

问题描述

4 个解决方案

解决方案1
49 已采纳 2017-11-16 18:29:03

解决方案2
8 2017-11-16 18:34:02

解决方案3
1 2020-10-22 15:07:15

解决方案4
0 2022-10-04 18:38:47

根据列值重复 pandas DataFrame 中的行

问题描述

4 个解决方案

解决方案1 49 已采纳 2017-11-16 18:29:03

解决方案2 8 2017-11-16 18:34:02

解决方案3 1 2020-10-22 15:07:15

解决方案4 0 2022-10-04 18:38:47

解决方案1
49 已采纳 2017-11-16 18:29:03

解决方案2
8 2017-11-16 18:34:02

解决方案3
1 2020-10-22 15:07:15

解决方案4
0 2022-10-04 18:38:47