简体   繁体   English

Pandas 数据帧最小/最大范围

[英]Pandas Dataframe Min/Max Range

Thank you in advance for helping!预先感谢您的帮助! (Code below) / Data Here: Link (下面的代码)/数据在这里: 链接

I am trying to add two more columns to my dataframe that represent the range of data for the Topsoil column, just like mean['maxx20']=maxx['20 cm'] and mean['minn20']=minn['20 cm'] do for the 20 cm column.我正在尝试向我的数据框中添加另外两列,它们代表 Topsoil 列的数据范围,就像 mean['maxx20']=maxx['20 cm'] 和 mean['minn20']=minn['20 cm'] 为 20 cm 列做。

I tried doing that by adding:我尝试通过添加以下内容来做到这一点:

mean['topsoilMax']=maxx['Topsoil']
mean['topsoilMin']=minn['Topsoil']

Instead of adding the additional columns as I had hoped, this caused KeyError: 'Topsoil' even though Topsoil was already a column in the data frame, just like 20 cm was for when I added the ranges for that.这并没有像我希望的那样添加额外的列,而是导致KeyError: 'Topsoil'即使 Topsoil 已经是数据框中的一列,就像我添加范围时的 20 cm 一样。

Why am I getting this error and what would be the proper way to add these columns?为什么我会收到此错误以及添加这些列的正确方法是什么?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

#Importing data, creating a copy, and assigning it to a variable
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()

#Setting the program to iterate based off of the station of the users choice
selected_soil_station = 'Minot'
df_selected_station = df_all_stations[df_all_stations['Station'] == selected_soil_station]
df_selected_station.fillna(method = 'ffill', inplace=True);

# Indexes the data by day and creates a column that keeps track of the day
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear


#Assigning variable so that mean represents df_selected_station_D but indexed by day
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index

#This inserts a new column named 'Topsoil' at the end that represents the average between 5 cm, 10 cm, and 20 cm
mean['Topsoil']=mean[['5 cm', '10 cm','20 cm']].mean(axis=1)


#Creating the range in which the line graph will fill in 
maxx=df_selected_station_D.groupby(by='Day').max()
minn=df_selected_station_D.groupby(by='Day').min()

mean['maxx20']=maxx['20 cm']
mean['minn20']=minn['20 cm']

在此处输入图片说明 在此处输入图片说明

enter image description here If i am understand your problem then the my way this problem solution is,在此处输入图像描述如果我了解您的问题,那么我的解决方法是,

topsoil = [-2.971686,-2.599278,-2.264897,-2.083117,-1.946969]表土 = [-2.971686,-2.599278,-2.264897,-2.083117,-1.946969]

max_number = max(topsoil) min_number = min(topsoil) print(max_number) #Here you get the max number of the topsoil list print(min_number) #Here you get the min number of the topsoil list print(max_number - min_number) #Here you get the max -min number of the topsoil list max_number = max(topsoil) min_number = min(topsoil) print(max_number) #这里得到表土列表的最大个数 print(min_number) #这里得到表土列表的最小个数 print(max_number - min_number) #这里你得到表土列表的最大 - 最小数量

Here the solution这里的解决方案

It's probably needed to add "Topsoil" columns to maxx and minn dataframes:可能需要将“Topsoil”列添加到 maxx 和 minn 数据帧:

maxx['Topsoil']=maxx[['5 cm', '10 cm','20 cm']].max(axis=1)
minn['Topsoil']=minn[['5 cm', '10 cm','20 cm']].min(axis=1)

After that assignment works:任务完成后:

mean['topsoilMax']=maxx['Topsoil']
mean['topsoilMin']=minn['Topsoil']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM