[英]Checking start and end date of different columns in python dataframe
I have a dataframe that has a date column and 4 other columns that contain numerical values.我有一个 dataframe 有一个日期列和其他 4 个包含数值的列。 But each of these other 4 columns start and end at different times.
但是其他 4 列中的每一列都在不同的时间开始和结束。 Is there a way in python that I can check start and end date for each column?
python 中是否有一种方法可以检查每列的开始和结束日期? Here's an example of my dataframe:
这是我的 dataframe 的示例:
df = pd.DataFrame({
'Date': [1930, 1931, 1932, 1933,1934],
'File1': [np.nan, 72, 58, 280, 958],
'File2': [np.nan, np.nan, np.nan, 13, 89],
'File3': [np.nan, 55, 68, 18, np.nan],
'File4': [45, 552, 177, np.nan, np.nan]
})
So for example i want to extract/know the start and end date for file 3 (in this case it should return 1931 and 1933).例如,我想提取/知道文件 3 的开始和结束日期(在这种情况下,它应该返回 1931 和 1933)。
If there's a way i can know the start and end date for all files that will be even better.如果有办法我可以知道所有文件的开始和结束日期,那就更好了。
Thank you in advance先感谢您
You can try something like that:你可以尝试这样的事情:
column_search='File2'
df_search=df[df[column_search].notnull()]
print(f"start date: {df_search['Date'].min()} ")
print(f"end date: {df_search['Date'].max()}")
According to your comment: To iterate trough columns:根据您的评论:迭代槽列:
for column in df.columns:
df_search=df[df[column].notnull()]
print(f"start date: {df_search['Date'].min()} ")
print(f"end date: {df_search['Date'].max()}")
if Date column is Index of the df:如果日期列是 df 的索引:
for column in df.columns:
idx_list=df.index[df[column].notnull()].tolist()
print(f"start date: {min(idx_list)} ")
print(f"end date: {max(idx_list)} ")
No need for explicity loops over columns, you can just use "apply".不需要对列进行显式循环,您只需使用“应用”即可。
This would give you a dictionary where the key is the file name and the values are the start and end date as a list:这将为您提供一个字典,其中键是文件名,值是列表的开始日期和结束日期:
df = df.set_index('Date')
result_dict = {}
def check_date(column):
x = column.notnull()
print(type(column[x]))
result_dict[column.name] = [column[x].head(1).index[0],
column[x].tail(1).index[0]]
df.apply(check_date)
print(result_dict)
I get this result:我得到这个结果:
{'File1': [1931, 1934], 'File2': [1933, 1934], 'File3': [1931, 1933], 'File4': [1930, 1932]}
Hope this helps.希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.