[英]Pandas/Numpy: Fastest way to create a ladder?
I have a pandas dataframe like: 我有一个像这样的熊猫数据框:
color cost temp
0 blue 12.0 80.4
1 red 8.1 81.2
2 pink 24.5 83.5
and I want to create a "ladder" or a "range" of costs for every row at 50 cent increments, from $0.50 below the current cost to $0.50 above the current cost. 我想为每行创建一个“阶梯”或一个“范围”,以50美分为增量,从当前成本以下的$ 0.50到当前成本以上的$ 0.50。 My current code is similar to the follow: 我当前的代码类似于以下内容:
incremented_prices = []
df['original_idx'] = df.index # To know it's original label
for row in df.iterrows():
current_price = row['cost']
more_costs = numpy.arange(current_price-1, current_price+1, step=0.5)
for cost in more_costs:
row_c = row.copy()
row_c['cost'] = cost
incremented_prices.append(row_c)
df_incremented = pandas.concat(incremented_prices)
And this code will produce a DataFrame like: 这段代码将产生一个DataFrame,如:
color cost temp original_idx
0 blue 11.5 80.4 0
1 blue 12.0 80.4 0
2 blue 12.5 80.4 0
3 red 7.6 81.2 1
4 red 8.1 81.2 1
5 red 8.6 81.2 1
6 pink 24.0 83.5 2
7 pink 24.5 83.5 2
8 pink 25.0 83.5 2
In the real problem, I will make ranges from -$50.00 to $50.00 and I find this really slow, is there some faster vectorized way? 在实际的问题中,我将使范围从-$ 50.00到$ 50.00,我发现这确实很慢,是否有一些更快的矢量化方式?
You can try recreate a data frame with numpy.repeat
: 您可以尝试使用numpy.repeat
重新创建一个数据框:
cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size
pd.DataFrame(dict(
color = pd.np.repeat(df.color.values, repeats),
# here is a vectorized method to calculate the costs with all steps added with broadcasting
cost = (df.cost.values[:, None] + cost_steps).ravel(),
temp = pd.np.repeat(df.temp.values, repeats),
original_idx = pd.np.repeat(df.index.values, repeats)
))
Update for more columns: 更新更多列:
df1 = df.rename_axis("original_idx").reset_index()
cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size
pd.DataFrame(pd.np.hstack((pd.np.repeat(df1.drop("cost", 1).values, repeats, axis=0),
(df1.cost[:, None] + cost_steps).reshape(-1, 1))),
columns=df1.columns.drop("cost").tolist()+["cost"])
Here's a NumPy intialization based approach - 这是一个基于NumPy初始化的方法-
increments = 0.5*np.arange(-1,2) # Edit the increments here
names = np.append(df.columns, 'original_idx')
M,N = df.shape
vals = df.values
cost_col_idx = (names == 'cost').argmax()
n = len(increments)
shp = (M,n,N+1)
b = np.empty(shp,dtype=object)
b[...,:-1] = vals[:,None]
b[...,-1] = np.arange(M)[:,None]
b[...,cost_col_idx] = vals[:,cost_col_idx].astype(float)[:,None] + increments
b.shape = (-1,N+1)
df_out = pd.DataFrame(b, columns=names)
To make the increments go from -50
to +50
with increments of 0.5
, use : 要使增量从-50
到+50
且增量为0.5
,请使用:
increments = 0.5*np.arange(-100,101)
Sample run - 样品运行-
In [200]: df
Out[200]:
color cost temp newcol
0 blue 12.0 80.4 mango
1 red 8.1 81.2 banana
2 pink 24.5 83.5 apple
In [201]: df_out
Out[201]:
color cost temp newcol original_idx
0 blue 11.5 80.4 mango 0
1 blue 12 80.4 mango 0
2 blue 12.5 80.4 mango 0
3 red 7.6 81.2 banana 1
4 red 8.1 81.2 banana 1
5 red 8.6 81.2 banana 1
6 pink 24 83.5 apple 2
7 pink 24.5 83.5 apple 2
8 pink 25 83.5 apple 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.