简体   繁体   English

基于另一个表中的多个列在一个表中创建一个列[python]

[英]creating a column in one table based on multiple columns from another table [python]

I am creating a csv table where I have informations about all of my Orders. 我正在创建一个csv表,其中包含有关我所有订单的信息。 Now I want to sell those items away but I want to add the extra surcharge depending on the price of the Item. 现在我想把这些物品卖掉,但我想根据物品的价格增加额外的附加费。 I created a new table with the surcharge , where I have columns called 'from' and 'to' from where I have to compare the item price and then include the right surcharge in the sale Price. 我创建了一个附带附加费的新表,其中我有一些名为'from'和'to'的列,我必须在那里比较商品价格,然后在销售价格中包含正确的附加费。

But I am not able to do this. 但我无法做到这一点。 I tried different approaches but non of them seem to work. 我尝试了不同的方法,但它们似乎没有用。 Any help would be nice :) 你能帮忙的话,我会很高兴 :)

My table looks like this: 我的表看起来像这样:

    OrderNo      NetPerPiece costsDividedPerOrder  HandlingPerPiece

0  7027514279        44.24     0.008007          0.354232

1  7027514279        15.93     0.008007          0.127552

2  7027514279        15.93     0.008007          0.127552

3  7027514279        15.93     0.008007          0.127552

4  7027514279        15.93     0.008007          0.127552
surcharges = {'surcharge': [0.35, 0.25, 0.2, 0.15, 0.12, 0.1],
'from': [0, 20, 200, 500, 1500, 5000], 
'to' : [20, 200, 500, 1500, 5000,1000000000] }
surchargeTable = DataFrame(surcharges, columns=['surcharge', 'from', 'to'])


productsPerOrder['NetPerpieceSale'] = numpy.where(((productsPerOrder['NetPerPiece'] >= surchargeTable['from']) & (productsPerOrder['NetPerPiece'] < surchargeTable['to'])), surchargeTable['surcharge'])


#I also tried this:

for index, row in productsPerOrder.iterrows():
        if row['NetPerPiece'] >= surchargeTable['from'] & row['NetPerPiece'] < surchargeTable['to']:
                productsPerOrder.loc[index,'NerPerPieceSale'] = surchargeTable.loc[row,'NetPerPieceSale'].values(0)

I want it to look like this: 我希望它看起来像这样:

 OrderNo   NetPerPiece costsDividedPerOrder  HandlingPerPiece NetPerPieceSale

0  7027514279   44.24           0.008007          0.354232    0.25

1  7027514279   15.93           0.008007          0.127552    0.35

2  7027514279   15.93           0.008007          0.127552    0.35

3  7027514279   15.93           0.008007          0.127552    0.35

4  7027514279   15.93           0.008007          0.127552    0.35

Just to remind, the file with items is much bigger, I only showed the head of the csv list. 只是提醒一下,带有项目的文件要大得多,我只展示了csv列表的头部。 So the tables are of the different lengths 因此表格的长度不同

SurchargeTable looks like this: SurchargeTable看起来像这样:

 surcharge  from          to
0       0.35     0          20
1       0.25    20         200
2       0.20   200         500
3       0.15   500        1500
4       0.12  1500        5000
5       0.10  5000  1000000000

Another way to do this is to use pd.IntervalIndex and map : 另一种方法是使用pd.IntervalIndexmap

# Create IntervalIndex on surchageTable dataframe
surchargeTable = surchargeTable.set_index(pd.IntervalIndex.from_arrays(surchargeTable['from'],
                                                                       surchargeTable['to']))

#Use map to pd.Series created from surchargeTable IntervalIndex and surcharge column.
productsPerOrder['NetPerPieceSale'] = productsPerOrder['NetPerPiece'].map(surchargeTable['surcharge'])

productsPerOrder

Output: 输出:

      OrderNo  NetPerPiece  costsDividedPerOrder  HandlingPerPiece  NetPerPieceSale
0  7027514279        44.24              0.008007          0.354232             0.25
1  7027514279        15.93              0.008007          0.127552             0.35
2  7027514279        15.93              0.008007          0.127552             0.35
3  7027514279        15.93              0.008007          0.127552             0.35
4  7027514279        15.93              0.008007          0.127552             0.35

Create a function to calculate the surcharge, then use .apply to apply it to the 'NetPerPiece' row. 创建一个计算附加费的函数,然后使用.apply将其应用于'NetPerPiece'行。

import pandas as pd
df = pd.read_csv('something.csv')   

def get_surcharges(x):
    to = [0, 20, 200, 500, 1500, 5000] 
    fr = [20, 200, 500, 1500, 5000,1000000000]
    surcharges = [0.35, 0.25, 0.2, 0.15, 0.12, 0.1]
    rr = list(zip(to, fr, surcharges))
    price = [r[2] for r in rr if x > r[0] and x <r[1]]
    return price[0]

df['NetPerpieceSale'] = df['NetPerPiece'].apply(lambda x: get_surcharges(x))

print(df)

This outputs: 这输出:

      OrderNo  NetPerPiece  costsDividedPerOrder  HandlingPerPiece  NetPerpieceSale
0  7027514279        44.24              0.008007          0.354232             0.25
1  7027514279        15.93              0.008007          0.127552             0.35
2  7027514279        15.93              0.008007          0.127552             0.35
3  7027514279        15.93              0.008007          0.127552             0.35
4  7027514279        15.93              0.008007          0.127552             0.35

Option without the for loop (kind of verbose): 没有for循环的选项(详细类型):

def get_surcharges(x):
    if x > 0:
        if x > 20:
            if x > 200:
                if x > 500:
                    if x > 1500:
                        if x > 5000:
                            return 0.1
                        else:
                            return 0.12
                    else:
                        return 0.15
                else:
                    return 0.2
            else:
                return 0.25
        else:
            return 0.35

Simply add a column to existing dataframe with the above calculations of NetPerPieceScale 只需使用NetPerPieceScale的上述计算将列添加到现有数据框中
or you can save the calculations to a dataframe like this: 或者您可以将计算保存到这样的数据框:
net=pd.DataFrame(NetPerPieceScale, columns=['NetPerPieceScale '])

and simply concat this to existing Dataframe you will have everything in 1 table 并简单地将其连接到现有的Dataframe,您将拥有1个表中的所有内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python pandas,将一个表中的多列与另一表中的单列合并 - Python pandas, merge multiple columns in one table with single column in another table 如何基于另一个表[R或Python]重新编码一个表中的多列? - How to recode multiple columns in a table based on another table [R or Python]? Python:根据另一列的值从一列创建元组 - Python: Creating tuple from one column based on values of another column (Python)根据来自多个其他列的值在 df 中创建列 - (Python) Creating a column in a df based on values from multiple other columns 创建一列作为行,另一列作为列的表 - Creating a table with one column as the rows and the other as the columns Postgresql:如何将多个列从一个表复制到另一个? - Postgresql: how to copy multiple columns from one table to another? 使用具有多列的 Python 创建 HTML 表 - Creating a HTML table with Python with multiple columns 如何使用基于另一个DataFrame的列将一个DataFrame列转移到真值表? - How do I pivot one DataFrame column to a truth table with columns based on another DataFrame? 根据另一个数据集中一列的值在一个数据框中创建列 - Creating columns in one dataframe based on the values of a column in another dataset 从列表Python创建sqlite表列 - Creating sqlite table columns from a list Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM