简体   繁体   English

创建具有重复值的熊猫数据框

[英]create pandas dataframe with repeating values

I am trying to create a pandas df that looks like: 我正在尝试创建一个如下的pandas df:

   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

To implement, I am for now creating two dataframes 要实现,我现在要创建两个数据框

df1 = pd.DataFrame({'AAA' : [4] * 2 , 'BBB' : [10,20], 'CCC' : [100,50]})
df2 = pd.DataFrame({'AAA': [5]*2, 'BBB' : [30,40],'CCC' : [-30,-50]})

and then appending rows of df2 to df1 to create the desired df 然后将df2的行附加到df1以创建所需的df

I tried to do 我试着做

df = pd.DataFrame({'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' :
 [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

But I get an error with the key message: 但关键消息出现错误:

ValueError('arrays must all be same length') ValueError: arrays must all be the same length ValueError('数组必须全部相同长度')ValueError:数组必须全部相同长度

I can of course do: 我当然可以:

df = pd.DataFrame({'AAA' : [4,4,5,5], 'BBB' : [10,20,30,40],'CCC' :
 [100,50,-30,-50]}); df

But is there not another elegant way to do this? 但是,没有别的优雅的方法可以做到这一点吗? This small example is easy to implement but if I want to scale up to many rows, the input becomes very long. 这个小例子很容易实现,但是如果我想扩展到很多行,输入将变得很长。

I believe you need join lists by + : 我相信您需要通过+加入名单:

df = pd.DataFrame({'AAA' : [4]*2 + [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

Or use repeat with concatenate : 或将repeatconcatenate一起使用:

df = pd.DataFrame({'AAA' :  np.concatenate([np.repeat(4, 2), np.repeat(5, 2)]),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

Alternative: 选择:

df = pd.DataFrame({'AAA' :  np.repeat((4,5), (2, 2)),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

For a general solution you could do: 对于一般的解决方案,您可以执行以下操作:

import pandas as pd

data = [(4, 2), (5, 2)]
df = pd.DataFrame({'AAA' : [value for value, reps in data for _ in range(reps)], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df)

Where data is a list of value, repetitions tuple. 如果数据是值列表,则重复元组。 So for your particular example you have 4 with 2 repetitions and 5 with 2 repetitions hence [(4, 2), (5, 2)] . 因此,对于您的特定示例,您有4个重复2和5个重复2因此[(4, 2), (5, 2)]

The error you get is quite clear. 您得到的错误非常清楚。 When you create a dataframe from a dictionary, all of the arrays must be the same length. 从字典创建数据框时,所有数组的长度必须相同。 When you create a dictionary, if you give the same key multiple time, the last one is used. 创建字典时,如果多次输入相同的键,则使用最后一个。 So 所以

{'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

is the same as 是相同的

{'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

When you try to create a dataframe from that dictionnary, you want one column with 2 rows and 2 columns with 4 rows, hence the error. 当您尝试从该字典创建数据框时,您需要一列2行和2列4行,因此会出现错误。 As @jezrael pointed out, you can create the desired column for 'AAA' by joining the list and then creating the dataframe from that list. 正如@jezrael所指出的,您可以通过加入列表,然后从该列表创建数据框,来为AAA创建所需的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM