如何根據列值重復熊貓數據框記錄

Question

我正在嘗試根據其中一列中的int值復制pandas DataFrame（v.0.23.4，python v.3.7.1）的行。 我正在應用此問題中的代碼來做到這一點，但是我遇到了以下數據類型轉換錯誤： TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe' 。 基本上，我不理解為什么這段代碼試圖轉換為int32 。

從這個開始

dummy_dict = {'c1': ['a','b','c'],
              'c2': [0,1,2]}
dummy_df = pd.DataFrame(dummy_dict)

    c1  c2  c3
0   a   0   textA
1   b   1   textB
2   c   2   textC

我在做這個

dummy_df_test = dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2']))

我最后要這個。 但是，我得到了以上錯誤。

    c1  c2  c3
0   a   0   textA
1   b   1   textB
2   c   2   textC
3   c   2   textC

Answer 1

只是一種解決方法：

pd.concat([dummy_df[dummy_df.c2.eq(0)],dummy_df.loc[dummy_df.index.repeat(dummy_df.c2)]])

另一個很棒的建議@Wen

dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2'].clip(lower=1)))

Answer 2

我相信可以在這里找到關於為什么發生的答案： https : //github.com/numpy/numpy/issues/4384

將dtype指定為int32應該可以解決原始注釋中突出顯示的問題。

Answer 3

在第一次嘗試中，所有行都是重復的，而在第二次嘗試中，只是索引為2的行。感謝使用concat函數。

df2 = pd.concat([df]*2, ignore_index=True)
print(df2)

df3= pd.concat([df, df.iloc[[2]]])
print(df3)

  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
3  a   0  textA
4  b   1  textB
5  c   2  textC
  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
2  c   2  textC

如果您打算在最后重置索引

df3=df3.reset_index(drop=True)

如何根據列值重復熊貓數據框記錄

問題描述

3 個解決方案

解決方案1
2 2019-05-13 18:03:17

解決方案2
0 2019-05-13 17:44:48

解決方案3
0 2019-05-13 17:51:24

如何根據列值重復熊貓數據框記錄

問題描述

3 個解決方案

解決方案1 2 2019-05-13 18:03:17

解決方案2 0 2019-05-13 17:44:48

解決方案3 0 2019-05-13 17:51:24

解決方案1
2 2019-05-13 18:03:17

解決方案2
0 2019-05-13 17:44:48

解決方案3
0 2019-05-13 17:51:24