简体   繁体   English

当我尝试删除所有不以特定名称开头的行时,如何将列名保留在数据框中?

[英]How do I keep the column names in a data frame when I am trying to drop all of the rows that don't start with specific names?

I need to drop the majority of the companies in a historical stock market data CSV.我需要删除历史股票市场数据 CSV 中的大多数公司。 The only companies I want to keep are 'GOOG', 'AAPL', 'AMZN', 'NFLX'.我唯一想保留的公司是“GOOG”、“AAPL”、“AMZN”、“NFLX”。 Note that there are over 20 000 companies listed in the CSV.请注意,CSV 中列出了 20 000 多家公司。 I also want to filter out these companies while only using certain columns in the CSV.我还想过滤掉这些公司,同时只使用 CSV 中的某些列。 The columns are: 'ticker', 'datekey', 'assets', 'eps', 'pe', 'price', 'revenue'.这些列是:'ticker'、'datekey'、'assets'、'eps'、'pe'、'price'、'revenue'。

The code to filter out these companies is:过滤掉这些公司的代码是:

list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']

for tickers in list:
    df1 = df[df.ticker == tickers]
    df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=False)

    continue

This code is successfully able to write only the columns that I want to the new CSV and only list the data of the companies that I want.这段代码能够成功地将我想要的列写入新的 CSV 并且只列出我想要的公司的数据。

The problem: The new CSV isn't printed with the column names which make it really confusing to have to go and list them manually especially when I will be adding more data columns.问题:新的 CSV 没有打印列名,这使得必须 go 并手动列出它们真的很混乱,特别是当我要添加更多数据列时。

Example of CSV I'm reading from (with data columns):我正在读取的 CSV 示例(带有数据列):

ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,1999-12-31,2000-03-15,2000-01-31,2020-09-01,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000
A,ARQ,2000-03-31,2000-06-12,2000-04-30,2020-09-01,-4000000,7321000000,,5057000000,2264000000,,10.27,-95000000,978000000,978000000,1261000000,166000000,2.313,0.577,98000000,98000000,0,98000000,329000000,103000000,0,0.0,0.0,256000000,359000000,0.144,359000000,256000000,256000000,0.37,0.36,0.37,4642000000,,4642000000,28969949822,,,-133000000,-0.294,1.0,1224000000,0.493,0,0,4255000000,,1622000000,0,0,0,2679000000,2186000000,493000000,29849949822,-390000000,-326000000,2000000,-13000000,0,-11000000,-341000000,95000000,-38000000,0,166000000,166000000,166000000,0,0,0.067,1010000000,214000000,572000000,0.0,6.43,,,1453000000,0,66.0,,,1826000000,297000000,2485000000,2485000000,296000000,,,,,0,714000000,1.0,452271967,452000000,457000000,5.498,7321000000,0,90000000,192000000,16.197,2871000000

The code is then listed in the new CSV like this:然后代码在新的 CSV 中列出,如下所示:

4290,AAPL,1998-02-09,4126000000.0,0.003,,0.171,1578000000.0
4291,AAPL,1998-05-11,3963000000.0,0.004,,0.276,1405000000.0
4292,AAPL,1998-08-10,4041000000.0,0.006999999999999999,,0.33899999999999997,1402000000.0

I then need to go in and manually add the column titles so that the final CSV (edited by me) looks like:然后我需要 go 并手动添加列标题,以便最终的 CSV (由我编辑)看起来像:

index,ticker,datekey,assets,eps,pe,price,revenue
4289,AAPL,1997-12-05,4233000000.0,,-1.9380000000000002,0.141,
4290,AAPL,1998-02-09,4126000000.0,0.003,,0.171,1578000000.0
4291,AAPL,1998-05-11,3963000000.0,0.004,,0.276,1405000000.0

How can I make this work when I have hundreds of data categories that I am using and can't input them manually?当我有数百个正在使用的数据类别并且无法手动输入它们时,我该如何进行这项工作?

list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']
first = True

for tickers in list:
    df1 = df[df.ticker == tickers]
    if first:
        df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=True)
        first = False
    else: 
        df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=False)
    continue

or more compactly或更紧凑

list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']
needheader = True

for tickers in list:
    df1 = df[df.ticker == tickers]
    df1.to_csv("20CompanyAnalysisData1.csv", mode='a', header=neadheader)
    needheader = False
    continue

Assign the column names before saving the file (and remove header=False), as below:在保存文件之前分配列名(并删除 header=False),如下所示:

list = ['GOOG', 'AAPL', 'AMZN', 'NFLX']

for tickers in list:
    df1 = df[df.ticker == tickers]
    df1.columns=['index','ticker','datekey','assets','eps','pe','price','revenue']
    df1.to_csv("20CompanyAnalysisData1.csv", mode='a')

    continue

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据列名将一个数据框中的列值复制到另一个数据框中? - How do I copy the value of columns in one data frame to another data frame based on column names? 当当前行和先前行中的名称(A 列中)匹配时,如何创建具有先前值(B 列中)的列? - How do I create a column with a previous value (in column B) when the names (in column A) in current and previous rows matches? 如何在不引用旧列名和不创建新数据框的情况下更改列名? - How can I change column names without referencing old column names and without creating a new data frame? 我无法在 python 的数据框中删除特定值 - I am unable to drop specific values in a data frame in python 我正在尝试使用 Django 在 html 表中显示我的 DataFrame 但它只显示列名而不显示行中的值? - I am trying to show my DataFrame in a html table using Django but it's only showing column names and not the values in rows? 我正在尝试使用 Pandas 用 NaN 替换特定列中特定行集中的数据 - I am trying to replace data within a specific set of rows in a specific column with NaN using Pandas 如何按列的值对pandas数据帧的行进行分组? - How do I group the rows of a pandas data frame by a value of a column? 如何删除或 select 特定行? - How do I drop or select specific rows? 不知道特定编码时如何解码数据 - How do I decode data when I don't know the specific encoding 如何删除缺少列名和数据的列 - how to drop columns missing column names AND data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM