[英]Repeat rows in a pandas DataFrame based on column value
I have the following df:我有以下 df:
code . role . persons
123 . Janitor . 3
123 . Analyst . 2
321 . Vallet . 2
321 . Auditor . 5
The first line means that I have 3 persons with the role Janitors.第一行意味着我有 3 个人扮演看门人的角色。 My problem is that I would need to have one line for each person.我的问题是我需要为每个人安排一行。 My df should look like this:我的 df 应该是这样的:
df:
code . role . persons
123 . Janitor . 3
123 . Janitor . 3
123 . Janitor . 3
123 . Analyst . 2
123 . Analyst . 2
321 . Vallet . 2
321 . Vallet . 2
321 . Auditor . 5
321 . Auditor . 5
321 . Auditor . 5
321 . Auditor . 5
321 . Auditor . 5
How could I do that using pandas?我怎么能用 pandas 做到这一点?
reindex
+ repeat
reindex
+ repeat
df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
PS: you can add .reset_index(drop=True)
to get the new index PS:您可以添加.reset_index(drop=True)
以获取新索引
Wen's solution is really nice and intuitive. Wen 的解决方案非常好且直观。 Here's an alternative, calling repeat
on df.values
.这是另一种选择,在df.values
上调用repeat
。
df
code role persons
0 123 Janitor 3
1 123 Analyst 2
2 321 Vallet 2
3 321 Auditor 5
pd.DataFrame(df.values.repeat(df.persons, axis=0), columns=df.columns)
code role persons
0 123 Janitor 3
1 123 Janitor 3
2 123 Janitor 3
3 123 Analyst 2
4 123 Analyst 2
5 321 Vallet 2
6 321 Vallet 2
7 321 Auditor 5
8 321 Auditor 5
9 321 Auditor 5
10 321 Auditor 5
11 321 Auditor 5
Not enough reputation to comment, but building on @cs95's answer and @lmiguelvargasf's comment, one can preserve dtypes with:没有足够的声誉发表评论,但基于@cs95 的回答和@lmiguelvargasf 的评论,可以使用以下方式保留 dtypes:
pd.DataFrame(
df.values.repeat(df.persons, axis=0),
columns=df.columns,
).astype(df.dtypes)
You can apply the Series method repeat
:您可以应用 Series 方法repeat
:
df = pd.DataFrame({'col1': [2, 3],
'col2': ['a', 'b'],
'col3': [20, 30]})
df.apply(lambda x: x.repeat(df['col1']))
# df.apply(pd.Series.repeat, repeats=df['col1'])
or the numpy function repeat
:或 numpy function repeat
:
df.apply(np.repeat, repeats=df['col1'])
Output: Output:
col1 col2 col3
0 2 a 20
0 2 a 20
1 3 b 30
1 3 b 30
1 3 b 30
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.