检查 python dataframe 中不同列的开始和结束日期

Question

I have a dataframe that has a date column and 4 other columns that contain numerical values.我有一个 dataframe 有一个日期列和其他 4 个包含数值的列。 But each of these other 4 columns start and end at different times.但是其他 4 列中的每一列都在不同的时间开始和结束。 Is there a way in python that I can check start and end date for each column? python 中是否有一种方法可以检查每列的开始和结束日期？ Here's an example of my dataframe:这是我的 dataframe 的示例：

 df = pd.DataFrame({
'Date': [1930, 1931, 1932, 1933,1934],
'File1': [np.nan, 72, 58, 280, 958],
'File2': [np.nan, np.nan, np.nan, 13, 89],
'File3': [np.nan, 55, 68, 18, np.nan],
'File4': [45, 552, 177, np.nan, np.nan]
})

So for example i want to extract/know the start and end date for file 3 (in this case it should return 1931 and 1933).例如，我想提取/知道文件 3 的开始和结束日期（在这种情况下，它应该返回 1931 和 1933）。

If there's a way i can know the start and end date for all files that will be even better.如果有办法我可以知道所有文件的开始和结束日期，那就更好了。

Thank you in advance先感谢您

Answer 1

You can try something like that:你可以尝试这样的事情：

column_search='File2'
df_search=df[df[column_search].notnull()]
print(f"start date: {df_search['Date'].min()} ")
print(f"end date: {df_search['Date'].max()}")

According to your comment: To iterate trough columns:根据您的评论：迭代槽列：

for column in df.columns:
    df_search=df[df[column].notnull()]
    print(f"start date: {df_search['Date'].min()} ")
    print(f"end date: {df_search['Date'].max()}")

if Date column is Index of the df:如果日期列是 df 的索引：

for column in df.columns:
    idx_list=df.index[df[column].notnull()].tolist() 
    print(f"start date: {min(idx_list)} ")
    print(f"end date: {max(idx_list)} ")

Answer 2

No need for explicity loops over columns, you can just use "apply".不需要对列进行显式循环，您只需使用“应用”即可。

This would give you a dictionary where the key is the file name and the values are the start and end date as a list:这将为您提供一个字典，其中键是文件名，值是列表的开始日期和结束日期：

df = df.set_index('Date')
result_dict = {}
def check_date(column):
    x = column.notnull()
    print(type(column[x]))
    result_dict[column.name] = [column[x].head(1).index[0], 
    column[x].tail(1).index[0]]
df.apply(check_date)
print(result_dict)

I get this result:我得到这个结果：

{'File1': [1931, 1934], 'File2': [1933, 1934], 'File3': [1931, 1933], 'File4': [1930, 1932]}

Hope this helps.希望这可以帮助。

检查 python dataframe 中不同列的开始和结束日期

问题描述

2 个解决方案

解决方案1
0 已采纳 2020-04-18 20:53:00

解决方案2
0 2020-04-18 21:45:38

检查 python dataframe 中不同列的开始和结束日期

问题描述

2 个解决方案

解决方案1 0 已采纳 2020-04-18 20:53:00

解决方案2 0 2020-04-18 21:45:38

解决方案1
0 已采纳 2020-04-18 20:53:00

解决方案2
0 2020-04-18 21:45:38