[英]How to create a new column based on Date Values & Condition in Pandas dataframe
Table 1:表格1:
Item Type Order Date Ship Date Purchase Cost项目类型 订单日期 发货日期 采购成本
0 Example 2014-08-10 2014-08-10 850.7544 0 示例 2014-08-10 2014-08-10 850.7544
1 Snacks 2014-08-10 2014-08-10 NaN 1 零食 2014-08-10 2014-08-10 NaN
2 Cosmetics 2/22/2015 2/22/2015 NaN 2 化妆品 2/22/2015 2/22/2015 NaN
3 Fruits 2015-09-12 2015-09-12 NaN 3 水果 2015-09-12 2015-09-12 NaN
4 Personal Care 9/17/2014 9/17/2014 NaN 4 个人护理 2014 年 9 月 17 日 2014 年 9 月 17 日 NaN
5 Household 2010-04-02 2010-04-02 NaN 5 户 2010-04-02 2010-04-02 NaN
6 Clothes 2/20/2013 2/20/2013 NaN 6 衣服 2/20/2013 2/20/2013 NaN
Table 2:表 2:
Item Type Purchase Start Date Purchase End Date Cost Per Unit
0 Baby Food 2010-01-01 2010-05-01 158.2736 0 婴儿食品 2010-01-01 2010-05-01 158.2736
1 Beverages 2010-01-01 2010-05-01 36.0620 1 饮料 2010-01-01 2010-05-01 36.0620
2 Cereal 2010-01-01 2010-05-01 160.4460 2 谷物 2010-01-01 2010-05-01 160.4460
3 Clothes 2010-01-01 2010-05-01 66.6608 3 衣服 2010-01-01 2010-05-01 66.6608
4 Cosmetics 2010-01-01 2010-05-01 266.6920 4 化妆品 2010-01-01 2010-05-01 266.6920
5 Fruits 2010-01-01 2010-05-01 5.5980 5 水果 2010-01-01 2010-05-01 5.5980
6 Household 2010-01-01 2010-05-01 467.7890 6户 2010-01-01 2010-05-01 467.7890
7 Meat 2010-01-01 2010-05-01 274.2285 7 肉类 2010-01-01 2010-05-01 274.2285
Here I need to fill the Purchase Cost Column In Table 1 Based in Table 2 v Date & Cost Per Unit Column在这里,我需要根据表 2 v Date & Cost Per Unit 列填写表 1 中的采购成本列
For example In table 1 Household Date Values between (2010-04-02,2010-04-02) so in Table 2 Household Values between (2010-01-01,2010-05-01),so from Table 1 Order Date & Ship Date are in the Date range of 'Purchase start Date'& 'Purchase End Date',so we can fill the value of Purchase cost as '467.789',So how to fill the Purchase cost?例如在表 1 中的家庭日期值介于 (2010-04-02,2010-04-02) 之间,因此在表 2 中的家庭值介于 (2010-01-01,2010-05-01) 之间,因此从表 1 订购日期和Ship Date在“Purchase start Date”和“Purchase End Date”的日期范围内,因此我们可以将Purchase cost的值填写为“467.789”,那么如何填写Purchase cost?
I assume that all "date" columns have been converted to datetime type.我假设所有“日期”列都已转换为日期时间类型。 Otherwise start from converting them.
否则从转换它们开始。
Generate an auxiliary Series :生成一个辅助系列:
wrk = pricing.assign(year=pricing['Start Date'].dt.year)\
.drop_duplicates(subset=['Item', 'year'])\
.set_index(['Item', 'year'])['(USD)dollar'].rename('price'); wrk
It contains first price in each year and product.它包含每年的第一个价格和产品。 Product name ( Item ) and year are levels of MultiIndex and the price is the value.
产品名称(项目)和年份是 MultiIndex 的级别,价格是价值。
For your sample data, completed with one row for Cosmetics for year 2014 , the result is:对于您的样本数据,在2014 年的Cosmetics行中完成了一行,结果为:
Item year
Snacks 2010 68
2011 72
Cosmetics 2014 50
Name: price, dtype: int64
Then, to fill price column, run:然后,要填充价格列,运行:
product.price = wrk[product.set_index(['Product',
product['Date (USD)'].dt.year]).index].tolist()
The result is:结果是:
Product Date (USD) price
0 Snacks 2010-02-03 68.0
1 Snacks 2010-02-06 68.0
2 Snacks 2014-02-03 NaN
3 Snacks 2012-02-03 NaN
4 Cosmetics 2012-02-03 NaN
5 Cosmetics 2013-02-03 NaN
6 Cosmetics 2013-02-08 NaN
7 Cosmetics 2014-02-06 50.0
8 Cosmetics 2014-02-09 50.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.