创建具有重复值的熊猫数据框

Question

I am trying to create a pandas df that looks like: 我正在尝试创建一个如下的pandas df：

   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

To implement, I am for now creating two dataframes 要实现，我现在要创建两个数据框

df1 = pd.DataFrame({'AAA' : [4] * 2 , 'BBB' : [10,20], 'CCC' : [100,50]})
df2 = pd.DataFrame({'AAA': [5]*2, 'BBB' : [30,40],'CCC' : [-30,-50]})

and then appending rows of df2 to df1 to create the desired df 然后将df2的行附加到df1以创建所需的df

I tried to do 我试着做

df = pd.DataFrame({'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' :
 [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

But I get an error with the key message: 但关键消息出现错误：

ValueError('arrays must all be same length') ValueError: arrays must all be the same length ValueError（'数组必须全部相同长度'）ValueError：数组必须全部相同长度

I can of course do: 我当然可以：

df = pd.DataFrame({'AAA' : [4,4,5,5], 'BBB' : [10,20,30,40],'CCC' :
 [100,50,-30,-50]}); df

But is there not another elegant way to do this? 但是，没有别的优雅的方法可以做到这一点吗？ This small example is easy to implement but if I want to scale up to many rows, the input becomes very long. 这个小例子很容易实现，但是如果我想扩展到很多行，输入将变得很长。

Answer 1

I believe you need join lists by + : 我相信您需要通过+加入名单：

df = pd.DataFrame({'AAA' : [4]*2 + [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

Or use repeat with concatenate : 或将repeat与concatenate一起使用：

df = pd.DataFrame({'AAA' :  np.concatenate([np.repeat(4, 2), np.repeat(5, 2)]),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

Alternative: 选择：

df = pd.DataFrame({'AAA' :  np.repeat((4,5), (2, 2)),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

Answer 2

For a general solution you could do: 对于一般的解决方案，您可以执行以下操作：

import pandas as pd

data = [(4, 2), (5, 2)]
df = pd.DataFrame({'AAA' : [value for value, reps in data for _ in range(reps)], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df)

Where data is a list of value, repetitions tuple. 如果数据是值列表，则重复元组。 So for your particular example you have 4 with 2 repetitions and 5 with 2 repetitions hence [(4, 2), (5, 2)] . 因此，对于您的特定示例，您有4个重复2和5个重复2因此[(4, 2), (5, 2)] 。

Answer 3

The error you get is quite clear. 您得到的错误非常清楚。 When you create a dataframe from a dictionary, all of the arrays must be the same length. 从字典创建数据框时，所有数组的长度必须相同。 When you create a dictionary, if you give the same key multiple time, the last one is used. 创建字典时，如果多次输入相同的键，则使用最后一个。 So 所以

{'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

is the same as 是相同的

{'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

When you try to create a dataframe from that dictionnary, you want one column with 2 rows and 2 columns with 4 rows, hence the error. 当您尝试从该字典创建数据框时，您需要一列2行和2列4行，因此会出现错误。 As @jezrael pointed out, you can create the desired column for 'AAA' by joining the list and then creating the dataframe from that list. 正如@jezrael所指出的，您可以通过加入列表，然后从该列表创建数据框，来为AAA创建所需的列。

创建具有重复值的熊猫数据框

问题描述

3 个解决方案

解决方案1
4 已采纳 2019-01-10 12:59:35

解决方案2
1 2019-01-10 13:01:51

解决方案3
1 2019-01-10 13:06:41

创建具有重复值的熊猫数据框

问题描述

3 个解决方案

解决方案1 4 已采纳 2019-01-10 12:59:35

解决方案2 1 2019-01-10 13:01:51

解决方案3 1 2019-01-10 13:06:41

解决方案1
4 已采纳 2019-01-10 12:59:35

解决方案2
1 2019-01-10 13:01:51

解决方案3
1 2019-01-10 13:06:41