简体   繁体   English

如何根据该文件中行中的日期将 CSV 文件在特定日期之间的值传输到另一个 CSV 文件?

[英]How do I transfer values of a CSV files between certain dates to another CSV file based on the dates in the rows in that file?

Long question: I have two CSV files, one called SF1 which has quarterly data (only 4 times a year) with a datekey column, and one called DAILY which gives data every day.长问题:我有两个 CSV 文件,一个名为 SF1,其中包含带有 datekey 列的季度数据(一年仅 4 次),另一个名为 DAILY,每天提供数据。 This is financial data so there are ticker columns.这是财务数据,所以有股票行情。

I need to grab the quarterly data for SF1 and write it to the DAILY csv file for all the days that are in between when we get the next quarterly data.我需要获取 SF1 的季度数据并将其写入 DAILY csv 文件,以便在我们获得下一个季度数据之间的所有日子里。

For example, AAPL has quarterly data released in SF1 on 2010-01-01 and its next earnings report is going to be on 2010-03-04.例如, AAPL在 2010 年 1 月 1 日在 SF1 发布季度数据,其下一份收益报告将于 2010 年 3 月 4 日发布。 I then need every row in the DAILY file with ticker AAPL between the dates 2010-01-01 until 2010-03-04 to have the same information as that one row on that date in the SF1 file.然后,我需要在 2010-01-01 到 2010-03-04 之间的日期为AAPL的 DAILY 文件中的每一行都具有与 SF1 文件中该日期的那一行相同的信息。

So far, I have made a python dictionary that goes through the SF1 file and adds the dates to a list which is the value of the ticker keys in the dictionary.到目前为止,我已经制作了一个 python 字典,它遍历 SF1 文件并将日期添加到一个列表中,该列表是字典中股票代码键的值。 I thought about potentially getting rid of the previous string and just referencing the string that is in the dictionary to go and search for the data to write to the DAILY file.我考虑过可能摆脱以前的字符串,只是将字典中的字符串引用到 go 并搜索要写入 DAILY 文件的数据。

Some of the columns needed to transfer from the SF1 file to the DAILY file are:从 SF1 文件传输到 DAILY 文件所需的一些列是:

['accoci', 'assets', 'assetsavg', 'assetsc', 'assetsnc', 'assetturnover', 'bvps', 'capex', 'cashneq', 'cashnequsd', 'cor', 'consolinc', 'currentratio', 'de', 'debt', 'debtc', 'debtnc', 'debtusd', 'deferredrev', 'depamor', 'deposits', 'divyield', 'dps', 'ebit']

Code so far:到目前为止的代码:

for ind, row in sf1.iterrows():
    sf1_date = row['datekey']
    sf1_ticker = row['ticker']
    company_date.setdefault(sf1_ticker, []).append(sf1_date)

What would be the best way to solve this problem?解决这个问题的最佳方法是什么?

SF1 csv: SF1 csv:

ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,2020-09-14,2020-09-14,2020-09-14,2020-09-14,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000

Daily csv:每日 csv:

ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9

Ideal csv after code run (with all the numbers for the assets under them):代码运行后理想的 csv (其中资产的所有数字):

ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital

The solution is merge_asof it allows to merge date columns to the closer immediately after or before in the second dataframe.解决方案是merge_asof ,它允许在第二个 dataframe 之后或之前将日期列合并到更接近的位置。

As is it not explicit, I will assume here that daily.date and sf1.datekey are both true date columns, meaning that their dtype is datetime64[ns] .由于它不明确,我将在这里假设daily.datesf1.datekey都是真正的日期列,这意味着它们的 dtype 是datetime64[ns] merge_asof cannot use string columns with an object dtype. merge_asof不能使用具有object的字符串列。

I will also assume that you do not want the ev evebit evebitda marketcap pb pe and ps columns from the sf1 dataframes because their names conflict with columns from daily (more on that later):我还将假设您不希望sf1数据帧中的 ev evebit evebitda marketcap pb pe 和 ps 列,因为它们的名称与daily的列冲突(稍后会详细介绍):

Code could be:代码可以是:

df = pd.merge_asof(daily, sf1.drop(columns=['dimension', 'calendardate',
                                            'reportperiod','lastupdated',
                                            'ev', 'evebit', 'evebitda',
                                            'marketcap', 'pb', 'pe', 'ps']),
                                   by = 'ticker', left_on='date',
                                   right_on='datekey')

You get the following list of columns: ticker, date, lastupdated, ev, evebit, evebitda, marketcap, pb, pe, ps, datekey, accoci, assets, assetsavg, assetsc, assetsnc, assetturnover, bvps, capex, cashneq, cashnequsd, cor, consolinc, currentratio, de, debt, debtc, debtnc, debtusd, deferredrev, depamor, deposits, divyield, dps, ebit, ebitda, ebitdamargin, ebitdausd, ebitusd, ebt, eps, epsdil, epsusd, equity, equityavg, equityusd, fcf, fcfps, fxusd, gp, grossmargin, intangibles, intexp, invcap, invcapavg, inventory, investments, investmentsc, investmentsnc, liabilities, liabilitiesc, liabilitiesnc, ncf, ncfbus, ncfcommon, ncfdebt, ncfdiv, ncff, ncfi, ncfinv, ncfo, ncfx, netinc, netinccmn, netinccmnusd, netincdis, netincnci, netmargin, opex, opinc, payables, payoutratio, pe1, ppnenet, prefdivis, price, ps1, receivables, retearn, revenue, revenueusd, rnd, roa, roe, roic, ros, sbcomp, sgna, sharefactor, sharesbas, shareswa, shareswadil, sps, tangibles, taxassets, taxexp, taxliabilities, tbvps,您将获得以下列列表:ticker, date, lastupdated, ev, evebit, evebitda, marketcap, pb, pe, ps, datekey, accoci, assets, assetsavg, assetsc, assetsnc, assetturnover, bvps, capex, cashneq, cashnequsd, cor,consolinc,currentratio,de,债务,debtc,debtnc,debtusd,deferredrev,depamor,存款,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,股权,equityavg,equityusd, fcf,fcfps,fxusd,gp,毛利,无形资产,intexp,invcap,invcapavg,库存,投资,投资c,投资nc,负债,负债c,liabilitync,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo, ncfx, netinc, netinccmn, netinccmnusd, netincdis, netincnci, netmargin, opex, opinc, 应付账款, payoutratio, pe1, ppnenet, prefdivis, price, ps1, 应收账款, reearn, 收入,incomeusd, rnd, roa, roe, roic, ros, sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,有形资产,taxassets,taxexp,taxliabilities,tbvps, workingcapital with their relevant values营运资本及其相关值


If you want to keep the columns existing in both dataframe, you will have to rename them.如果要保留 dataframe 中存在的列,则必须重命名它们。 Here is a possible code adding _d to the names of column from daily:这是一个可能的代码,将_d添加到每天的列名称中:

df2 = pd.merge_asof(daily, sf1.drop(columns=['dimension', 'calendardate',
                                            'reportperiod','lastupdated']),
                                   by = 'ticker', left_on='date',
                                   right_on='datekey', suffixes=('_d', ''))

The list of columns is now: ticker, date, lastupdated, ev_d, evebit_d, evebitda_d, marketcap_d, pb_d, pe_d, ps_d, datekey, accoci, ...现在的列列表是:ticker, date, lastupdated, ev_d, evebit_d, evebitda_d, marketcap_d, pb_d, pe_d, ps_d, datekey, accoci, ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 csv 文件中获取 2 个日期之间的差异? - How do I get the difference between 2 dates from a csv file? 如何使用python从CSV文件中过滤两个日期之间的行并重定向到另一个文件? - How to filter rows between two dates from CSV file using python and redirect to another file? 在循环中为.csv文件选择某些日期 - select certain dates inside loop for .csv file 在不使用熊猫的情况下,如何分析CSV数据并仅从CSV文件的某些列和行中提取某些值? - WITHOUT using Pandas, how do I analyze CSV data and only extract certain values from certain columns and rows of my CSV file? 如何读取 CSV 文件两次并根据通过函数传递的参数打印某些行? - How do I read through a CSV file twice and print certain rows based on arguments passed through the function? 如何在打印一定范围内的值时读取CSV文件中的特定行和列 - How do I read specific rows AND columns in CSV file while printing values within a certain range 如何根据不同的日期列查找 csv 文件的最小值和最大值? - How to find the min and max values of a csv file based on a different column of dates? 如何在我的 CSV 文件中打印用户设置的两个日期之间的数据? - How do print the data between two dates set by user in my CSV file? 如何合并具有相同列名但每个文件具有不同日期的多个.CSV 文件? - How can I merger multiple .CSV files with same column names but each file has different dates? 如何标准化csv文件中的日期? 蟒蛇 - How would I normalize dates in a csv file? python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM