[英]creating a column in one table based on multiple columns from another table [python]
I am creating a csv table where I have informations about all of my Orders. 我正在创建一个csv表,其中包含有关我所有订单的信息。 Now I want to sell those items away but I want to add the extra surcharge depending on the price of the Item.
现在我想把这些物品卖掉,但我想根据物品的价格增加额外的附加费。 I created a new table with the surcharge , where I have columns called 'from' and 'to' from where I have to compare the item price and then include the right surcharge in the sale Price.
我创建了一个附带附加费的新表,其中我有一些名为'from'和'to'的列,我必须在那里比较商品价格,然后在销售价格中包含正确的附加费。
But I am not able to do this. 但我无法做到这一点。 I tried different approaches but non of them seem to work.
我尝试了不同的方法,但它们似乎没有用。 Any help would be nice :)
你能帮忙的话,我会很高兴 :)
My table looks like this: 我的表看起来像这样:
OrderNo NetPerPiece costsDividedPerOrder HandlingPerPiece
0 7027514279 44.24 0.008007 0.354232
1 7027514279 15.93 0.008007 0.127552
2 7027514279 15.93 0.008007 0.127552
3 7027514279 15.93 0.008007 0.127552
4 7027514279 15.93 0.008007 0.127552
surcharges = {'surcharge': [0.35, 0.25, 0.2, 0.15, 0.12, 0.1],
'from': [0, 20, 200, 500, 1500, 5000],
'to' : [20, 200, 500, 1500, 5000,1000000000] }
surchargeTable = DataFrame(surcharges, columns=['surcharge', 'from', 'to'])
productsPerOrder['NetPerpieceSale'] = numpy.where(((productsPerOrder['NetPerPiece'] >= surchargeTable['from']) & (productsPerOrder['NetPerPiece'] < surchargeTable['to'])), surchargeTable['surcharge'])
#I also tried this:
for index, row in productsPerOrder.iterrows():
if row['NetPerPiece'] >= surchargeTable['from'] & row['NetPerPiece'] < surchargeTable['to']:
productsPerOrder.loc[index,'NerPerPieceSale'] = surchargeTable.loc[row,'NetPerPieceSale'].values(0)
I want it to look like this: 我希望它看起来像这样:
OrderNo NetPerPiece costsDividedPerOrder HandlingPerPiece NetPerPieceSale
0 7027514279 44.24 0.008007 0.354232 0.25
1 7027514279 15.93 0.008007 0.127552 0.35
2 7027514279 15.93 0.008007 0.127552 0.35
3 7027514279 15.93 0.008007 0.127552 0.35
4 7027514279 15.93 0.008007 0.127552 0.35
Just to remind, the file with items is much bigger, I only showed the head of the csv list. 只是提醒一下,带有项目的文件要大得多,我只展示了csv列表的头部。 So the tables are of the different lengths
因此表格的长度不同
SurchargeTable looks like this: SurchargeTable看起来像这样:
surcharge from to
0 0.35 0 20
1 0.25 20 200
2 0.20 200 500
3 0.15 500 1500
4 0.12 1500 5000
5 0.10 5000 1000000000
Another way to do this is to use pd.IntervalIndex
and map
: 另一种方法是使用
pd.IntervalIndex
和map
:
# Create IntervalIndex on surchageTable dataframe
surchargeTable = surchargeTable.set_index(pd.IntervalIndex.from_arrays(surchargeTable['from'],
surchargeTable['to']))
#Use map to pd.Series created from surchargeTable IntervalIndex and surcharge column.
productsPerOrder['NetPerPieceSale'] = productsPerOrder['NetPerPiece'].map(surchargeTable['surcharge'])
productsPerOrder
Output: 输出:
OrderNo NetPerPiece costsDividedPerOrder HandlingPerPiece NetPerPieceSale
0 7027514279 44.24 0.008007 0.354232 0.25
1 7027514279 15.93 0.008007 0.127552 0.35
2 7027514279 15.93 0.008007 0.127552 0.35
3 7027514279 15.93 0.008007 0.127552 0.35
4 7027514279 15.93 0.008007 0.127552 0.35
Create a function to calculate the surcharge, then use .apply
to apply it to the 'NetPerPiece' row. 创建一个计算附加费的函数,然后使用
.apply
将其应用于'NetPerPiece'行。
import pandas as pd
df = pd.read_csv('something.csv')
def get_surcharges(x):
to = [0, 20, 200, 500, 1500, 5000]
fr = [20, 200, 500, 1500, 5000,1000000000]
surcharges = [0.35, 0.25, 0.2, 0.15, 0.12, 0.1]
rr = list(zip(to, fr, surcharges))
price = [r[2] for r in rr if x > r[0] and x <r[1]]
return price[0]
df['NetPerpieceSale'] = df['NetPerPiece'].apply(lambda x: get_surcharges(x))
print(df)
This outputs: 这输出:
OrderNo NetPerPiece costsDividedPerOrder HandlingPerPiece NetPerpieceSale
0 7027514279 44.24 0.008007 0.354232 0.25
1 7027514279 15.93 0.008007 0.127552 0.35
2 7027514279 15.93 0.008007 0.127552 0.35
3 7027514279 15.93 0.008007 0.127552 0.35
4 7027514279 15.93 0.008007 0.127552 0.35
Option without the for loop (kind of verbose): 没有for循环的选项(详细类型):
def get_surcharges(x):
if x > 0:
if x > 20:
if x > 200:
if x > 500:
if x > 1500:
if x > 5000:
return 0.1
else:
return 0.12
else:
return 0.15
else:
return 0.2
else:
return 0.25
else:
return 0.35
Simply add a column to existing dataframe with the above calculations of NetPerPieceScale 只需使用NetPerPieceScale的上述计算将列添加到现有数据框中
or you can save the calculations to a dataframe like this: 或者您可以将计算保存到这样的数据框:
net=pd.DataFrame(NetPerPieceScale, columns=['NetPerPieceScale '])
and simply concat this to existing Dataframe you will have everything in 1 table 并简单地将其连接到现有的Dataframe,您将拥有1个表中的所有内容
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.