[英]How do I keep the column names in a data frame when I am trying to drop all of the rows that don't start with specific names?
I need to drop the majority of the companies in a historical stock market data CSV.我需要删除历史股票市场数据 CSV 中的大多数公司。 The only companies I want to keep are 'GOOG', 'AAPL', 'AMZN', 'NFLX'.
我唯一想保留的公司是“GOOG”、“AAPL”、“AMZN”、“NFLX”。 Note that there are over 20 000 companies listed in the CSV.
请注意,CSV 中列出了 20 000 多家公司。 I also want to filter out these companies while only using certain columns in the CSV.
我还想过滤掉这些公司,同时只使用 CSV 中的某些列。 The columns are: 'ticker', 'datekey', 'assets', 'eps', 'pe', 'price', 'revenue'.
这些列是:'ticker'、'datekey'、'assets'、'eps'、'pe'、'price'、'revenue'。
The code to filter out these companies is:过滤掉这些公司的代码是:
list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']
for tickers in list:
df1 = df[df.ticker == tickers]
df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=False)
continue
This code is successfully able to write only the columns that I want to the new CSV and only list the data of the companies that I want.这段代码能够成功地将我想要的列写入新的 CSV 并且只列出我想要的公司的数据。
The problem: The new CSV isn't printed with the column names which make it really confusing to have to go and list them manually especially when I will be adding more data columns.问题:新的 CSV 没有打印列名,这使得必须 go 并手动列出它们真的很混乱,特别是当我要添加更多数据列时。
Example of CSV I'm reading from (with data columns):我正在读取的 CSV 示例(带有数据列):
ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,1999-12-31,2000-03-15,2000-01-31,2020-09-01,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000
A,ARQ,2000-03-31,2000-06-12,2000-04-30,2020-09-01,-4000000,7321000000,,5057000000,2264000000,,10.27,-95000000,978000000,978000000,1261000000,166000000,2.313,0.577,98000000,98000000,0,98000000,329000000,103000000,0,0.0,0.0,256000000,359000000,0.144,359000000,256000000,256000000,0.37,0.36,0.37,4642000000,,4642000000,28969949822,,,-133000000,-0.294,1.0,1224000000,0.493,0,0,4255000000,,1622000000,0,0,0,2679000000,2186000000,493000000,29849949822,-390000000,-326000000,2000000,-13000000,0,-11000000,-341000000,95000000,-38000000,0,166000000,166000000,166000000,0,0,0.067,1010000000,214000000,572000000,0.0,6.43,,,1453000000,0,66.0,,,1826000000,297000000,2485000000,2485000000,296000000,,,,,0,714000000,1.0,452271967,452000000,457000000,5.498,7321000000,0,90000000,192000000,16.197,2871000000
The code is then listed in the new CSV like this:然后代码在新的 CSV 中列出,如下所示:
4290,AAPL,1998-02-09,4126000000.0,0.003,,0.171,1578000000.0
4291,AAPL,1998-05-11,3963000000.0,0.004,,0.276,1405000000.0
4292,AAPL,1998-08-10,4041000000.0,0.006999999999999999,,0.33899999999999997,1402000000.0
I then need to go in and manually add the column titles so that the final CSV (edited by me) looks like:然后我需要 go 并手动添加列标题,以便最终的 CSV (由我编辑)看起来像:
index,ticker,datekey,assets,eps,pe,price,revenue
4289,AAPL,1997-12-05,4233000000.0,,-1.9380000000000002,0.141,
4290,AAPL,1998-02-09,4126000000.0,0.003,,0.171,1578000000.0
4291,AAPL,1998-05-11,3963000000.0,0.004,,0.276,1405000000.0
How can I make this work when I have hundreds of data categories that I am using and can't input them manually?当我有数百个正在使用的数据类别并且无法手动输入它们时,我该如何进行这项工作?
list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']
first = True
for tickers in list:
df1 = df[df.ticker == tickers]
if first:
df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=True)
first = False
else:
df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=False)
continue
or more compactly或更紧凑
list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']
needheader = True
for tickers in list:
df1 = df[df.ticker == tickers]
df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=neadheader)
needheader = False
continue
Assign the column names before saving the file (and remove header=False), as below:在保存文件之前分配列名(并删除 header=False),如下所示:
list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']
for tickers in list:
df1 = df[df.ticker == tickers]
df1.columns=['index','ticker','datekey','assets','eps','pe','price','revenue']
df1.to_csv("20CompanyAnalysisData1.csv", mode='a')
continue
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.