简体   繁体   中英

How to repeat pandas dataframe records based on column value

I'm trying to duplicate rows of a pandas DataFrame (v.0.23.4, python v.3.7.1) based on an int value in one of the columns. I'm applying code from this question to do that, but I'm running into the following data type casting error: TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe' . Basically, I'm not understanding why this code is attempting to cast to int32 .

Starting with this,

dummy_dict = {'c1': ['a','b','c'],
              'c2': [0,1,2]}
dummy_df = pd.DataFrame(dummy_dict)
    c1  c2  c3
0   a   0   textA
1   b   1   textB
2   c   2   textC

I'm doing this

dummy_df_test = dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2']))

I want this at the end. However, I'm getting the above error instead.

    c1  c2  c3
0   a   0   textA
1   b   1   textB
2   c   2   textC
3   c   2   textC

Just a workaround:

pd.concat([dummy_df[dummy_df.c2.eq(0)],dummy_df.loc[dummy_df.index.repeat(dummy_df.c2)]])

Another fantastic suggestion courtesy @Wen

dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2'].clip(lower=1)))

  c1  c2
0  a   0
1  b   1
2  c   2
2  c   2

I believe the answer as to why it's happening can be found here: https://github.com/numpy/numpy/issues/4384

Specifying the dtype as int32 should solve the problem as highlighted in the original comment.

In the first attempt all rows are duplicated, and in the second attempt just the row with the index 2. Thanks to the concat function.

df2 = pd.concat([df]*2, ignore_index=True)
print(df2)

df3= pd.concat([df, df.iloc[[2]]])
print(df3)

  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
3  a   0  textA
4  b   1  textB
5  c   2  textC
  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
2  c   2  textC

If you plan to reset the index at the end

df3=df3.reset_index(drop=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM