![](/img/trans.png)
[英]How to calculate the difference between two cumsum columns using Pandas
[英]Comparing two columns using Pandas (or numpy) and calculate percentage difference
免责声明:我正在学习在 Python 中进行开发,我知道这种编码方式可能就像垃圾一样,但我计划在创建程序的同时不断改进。
所以我正在尝试构建一个爬虫来每天使用 Selenium 检查特定的航班价格,并且这部分代码已经完成。 始发地、目的地、首飞日期、二飞日期和价格将每天保存。 我将这些数据保存到一个文件中,然后比较价格是否有任何变化。
我的目标是确定价格变化是否超过 X 个百分比,然后在每个比较航班的脚本中打印一条消息。
import pandas as pd
import os.path
import numpy as np
#This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2= 'CDG'
to2= 'JFK'
#End of sample data
flightdata = {'From': [fromm, fromm2], 'To': [to,to2], 'Departure date': [departuredate,departuredate2], 'Return date': [returndate,returndate2], 'Price': [price,price2]}
df = pd.DataFrame(flightdata, columns= ['From', 'To', 'Departure date', 'Return date', 'Price'])
#Check if the script is running for the first time
if os.path.exists('flightstoday.xls') == True:
os.remove("flightsyesterday.xls")
os.rename('flightstoday.xls', 'flightsyesterday.xls') #Rename the flights scraped fromm yesterday
df.to_csv('flightstoday.xls', mode='a', header=True, sep='\t')
else:
df.to_csv('flightstoday.xls', mode='w', header=True, sep='\t')
#Work with two dataframes
flightsyesterday = pd.read_csv("flightsyesterday.xls",sep='\t')
flightstoday = pd.read_csv("flightstoday.xls",sep='\t')
我缺少的是如何比较“价格”列并打印一条消息,说明对于具有“从”、“至”、“出发日期”、“返回日期”的行 X,航班已更改 X 百分比.
我已经尝试过这段代码,但它只在flightstoday文件中添加了一列,而不是百分比,当然也不会打印价格有任何变化。
flightstoday['PriceDiff'] = np.where(vueloshoy['Price'] == vuelosayer['Price'], 0, vueloshoy['Price'] - vuelosayer['Price'])
对这个新手的任何帮助将不胜感激。 谢谢!
从我收集到的信息来看,我认为这就是你打算做的。
import pandas as pd
import os.path
import numpy as np
# This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2 = 'CDG'
to2 = 'JFK'
# Create second set of prices
price3 = 250
price4 = 600
# Generate data to construct DataFrames
today_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price, price2]}
yesterday_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price3, price4]}
# Create dataframes for yesterday and today
today = pd.DataFrame(today_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
yesterday = pd.DataFrame(yesterday_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
# Determine changes
today['price_change'] = (
today['Price'] - yesterday['Price']) / yesterday['Price'] * 100.
# Determine indices of all rows where price_change > threshold
threshold = 1.0
today['exceeds_threshold'] = abs(today['price_change']) >= threshold
exceed_indices = today['exceeds_threshold'][today['exceeds_threshold']].index
# Print out those entries that exceed threshold
for idx in exceed_indices:
row = today.iloc[idx]
print('Flight from {} to {} leaving on {} and returning on {} has changed by {}%'.format(
row['From'], row['To'], row['Departure date'], row['Return date'], row['price_change']))
Output:
Flight from CDG to JFK leaving on 20/02/2020 and returning on 20/02/2020 has changed by 5.0%
我从这篇文章中学习了计算exceed_indices
的语法
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.