[英]create pandas dataframe with repeating values
I am trying to create a pandas df that looks like: 我正在尝试创建一个如下的pandas df:
AAA BBB CCC
0 4 10 100
1 4 20 50
2 5 30 -30
3 5 40 -50
To implement, I am for now creating two dataframes 要实现,我现在要创建两个数据框
df1 = pd.DataFrame({'AAA' : [4] * 2 , 'BBB' : [10,20], 'CCC' : [100,50]})
df2 = pd.DataFrame({'AAA': [5]*2, 'BBB' : [30,40],'CCC' : [-30,-50]})
and then appending rows of df2 to df1 to create the desired df 然后将df2的行附加到df1以创建所需的df
I tried to do 我试着做
df = pd.DataFrame({'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' :
[10,20,30,40],'CCC' : [100,50,-30,-50]}); df
But I get an error with the key message: 但关键消息出现错误:
ValueError('arrays must all be same length') ValueError: arrays must all be the same length
ValueError('数组必须全部相同长度')ValueError:数组必须全部相同长度
I can of course do: 我当然可以:
df = pd.DataFrame({'AAA' : [4,4,5,5], 'BBB' : [10,20,30,40],'CCC' :
[100,50,-30,-50]}); df
But is there not another elegant way to do this? 但是,没有别的优雅的方法可以做到这一点吗? This small example is easy to implement but if I want to scale up to many rows, the input becomes very long.
这个小例子很容易实现,但是如果我想扩展到很多行,输入将变得很长。
I believe you need join lists by +
: 我相信您需要通过
+
加入名单:
df = pd.DataFrame({'AAA' : [4]*2 + [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print (df)
AAA BBB CCC
0 4 10 100
1 4 20 50
2 5 30 -30
3 5 40 -50
Or use repeat
with concatenate
: 或将
repeat
与concatenate
一起使用:
df = pd.DataFrame({'AAA' : np.concatenate([np.repeat(4, 2), np.repeat(5, 2)]),
'BBB' : [10,20,30,40],
'CCC' : [100,50,-30,-50]})
Alternative: 选择:
df = pd.DataFrame({'AAA' : np.repeat((4,5), (2, 2)),
'BBB' : [10,20,30,40],
'CCC' : [100,50,-30,-50]})
print (df)
AAA BBB CCC
0 4 10 100
1 4 20 50
2 5 30 -30
3 5 40 -50
For a general solution you could do: 对于一般的解决方案,您可以执行以下操作:
import pandas as pd
data = [(4, 2), (5, 2)]
df = pd.DataFrame({'AAA' : [value for value, reps in data for _ in range(reps)], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df)
Where data is a list of value, repetitions tuple. 如果数据是值列表,则重复元组。 So for your particular example you have 4 with 2 repetitions and 5 with 2 repetitions hence
[(4, 2), (5, 2)]
. 因此,对于您的特定示例,您有4个重复2和5个重复2因此
[(4, 2), (5, 2)]
。
The error you get is quite clear. 您得到的错误非常清楚。 When you create a dataframe from a dictionary, all of the arrays must be the same length.
从字典创建数据框时,所有数组的长度必须相同。 When you create a dictionary, if you give the same key multiple time, the last one is used.
创建字典时,如果多次输入相同的键,则使用最后一个。 So
所以
{'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}
is the same as 是相同的
{'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}
When you try to create a dataframe from that dictionnary, you want one column with 2 rows and 2 columns with 4 rows, hence the error. 当您尝试从该字典创建数据框时,您需要一列2行和2列4行,因此会出现错误。 As @jezrael pointed out, you can create the desired column for 'AAA' by joining the list and then creating the dataframe from that list.
正如@jezrael所指出的,您可以通过加入列表,然后从该列表创建数据框,来为AAA创建所需的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.