[英]How do I transfer values of a CSV files between certain dates to another CSV file based on the dates in the rows in that file?
Long question: I have two CSV files, one called SF1 which has quarterly data (only 4 times a year) with a datekey column, and one called DAILY which gives data every day.长问题:我有两个 CSV 文件,一个名为 SF1,其中包含带有 datekey 列的季度数据(一年仅 4 次),另一个名为 DAILY,每天提供数据。 This is financial data so there are ticker columns.
这是财务数据,所以有股票行情。
I need to grab the quarterly data for SF1 and write it to the DAILY csv file for all the days that are in between when we get the next quarterly data.我需要获取 SF1 的季度数据并将其写入 DAILY csv 文件,以便在我们获得下一个季度数据之间的所有日子里。
For example, AAPL
has quarterly data released in SF1 on 2010-01-01 and its next earnings report is going to be on 2010-03-04.例如,
AAPL
在 2010 年 1 月 1 日在 SF1 发布季度数据,其下一份收益报告将于 2010 年 3 月 4 日发布。 I then need every row in the DAILY file with ticker AAPL
between the dates 2010-01-01 until 2010-03-04 to have the same information as that one row on that date in the SF1 file.然后,我需要在 2010-01-01 到 2010-03-04 之间的日期为
AAPL
的 DAILY 文件中的每一行都具有与 SF1 文件中该日期的那一行相同的信息。
So far, I have made a python dictionary that goes through the SF1 file and adds the dates to a list which is the value of the ticker keys in the dictionary.到目前为止,我已经制作了一个 python 字典,它遍历 SF1 文件并将日期添加到一个列表中,该列表是字典中股票代码键的值。 I thought about potentially getting rid of the previous string and just referencing the string that is in the dictionary to go and search for the data to write to the DAILY file.
我考虑过可能摆脱以前的字符串,只是将字典中的字符串引用到 go 并搜索要写入 DAILY 文件的数据。
Some of the columns needed to transfer from the SF1 file to the DAILY file are:从 SF1 文件传输到 DAILY 文件所需的一些列是:
['accoci', 'assets', 'assetsavg', 'assetsc', 'assetsnc', 'assetturnover', 'bvps', 'capex', 'cashneq', 'cashnequsd', 'cor', 'consolinc', 'currentratio', 'de', 'debt', 'debtc', 'debtnc', 'debtusd', 'deferredrev', 'depamor', 'deposits', 'divyield', 'dps', 'ebit']
Code so far:到目前为止的代码:
for ind, row in sf1.iterrows():
sf1_date = row['datekey']
sf1_ticker = row['ticker']
company_date.setdefault(sf1_ticker, []).append(sf1_date)
What would be the best way to solve this problem?解决这个问题的最佳方法是什么?
SF1 csv: SF1 csv:
ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,2020-09-14,2020-09-14,2020-09-14,2020-09-14,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000
Daily csv:每日 csv:
ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9
Ideal csv after code run (with all the numbers for the assets under them):代码运行后理想的 csv (其中资产的所有数字):
ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
The solution is merge_asof
it allows to merge date columns to the closer immediately after or before in the second dataframe.解决方案是
merge_asof
,它允许在第二个 dataframe 之后或之前将日期列合并到更接近的位置。
As is it not explicit, I will assume here that daily.date
and sf1.datekey
are both true date columns, meaning that their dtype is datetime64[ns]
.由于它不明确,我将在这里假设
daily.date
和sf1.datekey
都是真正的日期列,这意味着它们的 dtype 是datetime64[ns]
。 merge_asof
cannot use string columns with an object
dtype. merge_asof
不能使用具有object
的字符串列。
I will also assume that you do not want the ev evebit evebitda marketcap pb pe and ps columns from the sf1
dataframes because their names conflict with columns from daily
(more on that later):我还将假设您不希望
sf1
数据帧中的 ev evebit evebitda marketcap pb pe 和 ps 列,因为它们的名称与daily
的列冲突(稍后会详细介绍):
Code could be:代码可以是:
df = pd.merge_asof(daily, sf1.drop(columns=['dimension', 'calendardate',
'reportperiod','lastupdated',
'ev', 'evebit', 'evebitda',
'marketcap', 'pb', 'pe', 'ps']),
by = 'ticker', left_on='date',
right_on='datekey')
You get the following list of columns: ticker, date, lastupdated, ev, evebit, evebitda, marketcap, pb, pe, ps, datekey, accoci, assets, assetsavg, assetsc, assetsnc, assetturnover, bvps, capex, cashneq, cashnequsd, cor, consolinc, currentratio, de, debt, debtc, debtnc, debtusd, deferredrev, depamor, deposits, divyield, dps, ebit, ebitda, ebitdamargin, ebitdausd, ebitusd, ebt, eps, epsdil, epsusd, equity, equityavg, equityusd, fcf, fcfps, fxusd, gp, grossmargin, intangibles, intexp, invcap, invcapavg, inventory, investments, investmentsc, investmentsnc, liabilities, liabilitiesc, liabilitiesnc, ncf, ncfbus, ncfcommon, ncfdebt, ncfdiv, ncff, ncfi, ncfinv, ncfo, ncfx, netinc, netinccmn, netinccmnusd, netincdis, netincnci, netmargin, opex, opinc, payables, payoutratio, pe1, ppnenet, prefdivis, price, ps1, receivables, retearn, revenue, revenueusd, rnd, roa, roe, roic, ros, sbcomp, sgna, sharefactor, sharesbas, shareswa, shareswadil, sps, tangibles, taxassets, taxexp, taxliabilities, tbvps,您将获得以下列列表:ticker, date, lastupdated, ev, evebit, evebitda, marketcap, pb, pe, ps, datekey, accoci, assets, assetsavg, assetsc, assetsnc, assetturnover, bvps, capex, cashneq, cashnequsd, cor,consolinc,currentratio,de,债务,debtc,debtnc,debtusd,deferredrev,depamor,存款,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,股权,equityavg,equityusd, fcf,fcfps,fxusd,gp,毛利,无形资产,intexp,invcap,invcapavg,库存,投资,投资c,投资nc,负债,负债c,liabilitync,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo, ncfx, netinc, netinccmn, netinccmnusd, netincdis, netincnci, netmargin, opex, opinc, 应付账款, payoutratio, pe1, ppnenet, prefdivis, price, ps1, 应收账款, reearn, 收入,incomeusd, rnd, roa, roe, roic, ros, sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,有形资产,taxassets,taxexp,taxliabilities,tbvps, workingcapital with their relevant values
营运资本及其相关值
If you want to keep the columns existing in both dataframe, you will have to rename them.如果要保留 dataframe 中存在的列,则必须重命名它们。 Here is a possible code adding
_d
to the names of column from daily:这是一个可能的代码,将
_d
添加到每天的列名称中:
df2 = pd.merge_asof(daily, sf1.drop(columns=['dimension', 'calendardate',
'reportperiod','lastupdated']),
by = 'ticker', left_on='date',
right_on='datekey', suffixes=('_d', ''))
The list of columns is now: ticker, date, lastupdated, ev_d, evebit_d, evebitda_d, marketcap_d, pb_d, pe_d, ps_d, datekey, accoci, ...现在的列列表是:ticker, date, lastupdated, ev_d, evebit_d, evebitda_d, marketcap_d, pb_d, pe_d, ps_d, datekey, accoci, ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.