简体   繁体   English

将 pandas 数据框中的最后一个非空行写入 CSV?

[英]Write to CSV the last non-null row from pandas dataframe?

This writes all records, including null PbRatios.这将写入所有记录,包括空 PbRatios。 I would like to write the last non-null record only.我只想写最后一个非空记录。 When I add df[df.asOfDate == df.asOfDate.max()].to_csv , it gets the last record, which is always null.当我添加df[df.asOfDate == df.asOfDate.max()].to_csv时,它会获取最后一条记录,该记录始终为空。

import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT','NVDA']
header = ["asOfDate","PbRatio"]
           
for tick in symbols:
    faang = Ticker(tick)
    faang.valuation_measures
    df = faang.valuation_measures
    try:
        for column_name in header :
            if column_name  not in df.columns:
                df.loc[:,column_name  ] = None
        df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
    except AttributeError:
        continue

Current output:当前输出:

在此处输入图像描述

Desired output:期望的输出:

在此处输入图像描述

This should work.这应该工作。 Just filter for the not Nan values in the df and filter for the max asOfDate .只需过滤 df 中的非 Nan 值并过滤最大asOfDate

for tick in symbols:
    faang = Ticker(tick)
    faang.valuation_measures
    df = faang.valuation_measures
    try:
        for column_name in header :
            if column_name  not in df.columns:
                df.loc[:,column_name  ] = None
    except AttributeError:
        continue

    # filter for notna
    df = df[df['PbRatio'].notna()]
   
    # filter for max date
    df = df[df['asOfDate'] == df['asOfDate'].max()]
    df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)

Here I created a dummy data to work with, would have been nice if you provided data.在这里我创建了一个虚拟数据来处理,如果你提供数据就更好了。

df = pd.DataFrame([['A',12,123],['A',13,125],['A',2,None],['B',16,133],
['B',16,None],['B',14,139]], columns=['Name','id','score'])
    Name    id  score
0   A   12  123.0
1   A   13  125.0
2   A   2   NaN
3   B   16  133.0
4   B   16  NaN
5   B   14  139.0

then you drop the rows with missing values然后删除缺少值的行

df = df.dropna(how = 'any')

this looks like this:这看起来像这样:

    Name    id  score
0   A   12  123.0
1   A   13  125.0
3   B   16  133.0
5   B   14  139.0

I get the set of unique names, this is whatever 'AAPL'/'NVDA' column you have我得到了一组唯一名称,这是你拥有的任何“AAPL”/“NVDA”列

names = set(df['Name'])

create a new dataframe where I grab only the last row for each unique name, in my example that would be 'A' and 'B'.创建一个新的数据框,在其中我只获取每个唯一名称的最后一行,在我的示例中为“A”和“B”。 in yours that should be 'AAPL'/'NVDA'.在你的应该是'AAPL'/'NVDA'。

new_df = pd.DataFrame(columns=df.columns)
for n in names:
    new_df.loc[new_df.shape[0]] = df.loc[df.query(f"Name== '{n}'").index[-1]]

and this should look like这应该看起来像

new_df
>>>

    Name    id  score
0   B   14  139.0
1   A   13  125.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM