[英]Write to CSV the last non-null row from pandas dataframe?
This writes all records, including null PbRatios.这将写入所有记录,包括空 PbRatios。 I would like to write the last non-null record only.
我只想写最后一个非空记录。 When I add
df[df.asOfDate == df.asOfDate.max()].to_csv
, it gets the last record, which is always null.当我添加
df[df.asOfDate == df.asOfDate.max()].to_csv
时,它会获取最后一条记录,该记录始终为空。
import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT','NVDA']
header = ["asOfDate","PbRatio"]
for tick in symbols:
faang = Ticker(tick)
faang.valuation_measures
df = faang.valuation_measures
try:
for column_name in header :
if column_name not in df.columns:
df.loc[:,column_name ] = None
df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
except AttributeError:
continue
Current output:当前输出:
Desired output:期望的输出:
This should work.这应该工作。 Just filter for the not Nan values in the df and filter for the max
asOfDate
.只需过滤 df 中的非 Nan 值并过滤最大
asOfDate
。
for tick in symbols:
faang = Ticker(tick)
faang.valuation_measures
df = faang.valuation_measures
try:
for column_name in header :
if column_name not in df.columns:
df.loc[:,column_name ] = None
except AttributeError:
continue
# filter for notna
df = df[df['PbRatio'].notna()]
# filter for max date
df = df[df['asOfDate'] == df['asOfDate'].max()]
df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
Here I created a dummy data to work with, would have been nice if you provided data.在这里我创建了一个虚拟数据来处理,如果你提供数据就更好了。
df = pd.DataFrame([['A',12,123],['A',13,125],['A',2,None],['B',16,133],
['B',16,None],['B',14,139]], columns=['Name','id','score'])
Name id score
0 A 12 123.0
1 A 13 125.0
2 A 2 NaN
3 B 16 133.0
4 B 16 NaN
5 B 14 139.0
then you drop the rows with missing values然后删除缺少值的行
df = df.dropna(how = 'any')
this looks like this:这看起来像这样:
Name id score
0 A 12 123.0
1 A 13 125.0
3 B 16 133.0
5 B 14 139.0
I get the set of unique names, this is whatever 'AAPL'/'NVDA' column you have我得到了一组唯一名称,这是你拥有的任何“AAPL”/“NVDA”列
names = set(df['Name'])
create a new dataframe where I grab only the last row for each unique name, in my example that would be 'A' and 'B'.创建一个新的数据框,在其中我只获取每个唯一名称的最后一行,在我的示例中为“A”和“B”。 in yours that should be 'AAPL'/'NVDA'.
在你的应该是'AAPL'/'NVDA'。
new_df = pd.DataFrame(columns=df.columns)
for n in names:
new_df.loc[new_df.shape[0]] = df.loc[df.query(f"Name== '{n}'").index[-1]]
and this should look like这应该看起来像
new_df
>>>
Name id score
0 B 14 139.0
1 A 13 125.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.