I don't understand this line of code
minimum.append(min(j[1]['Data_Value']))
...specifically
j[1]['Data_Value']
I know the full code returns the minimum value and stores it in a list called minimum, but what does the j[1] do there? I've tried using other numbers to figure it out but get an error. Is it selecting the index or something?
Full code below. Thanks!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
df1 = pd.read_csv('./data/C2A2_data/BinnedCsvs_d400/ed157460d30113a689e487b88dcbef1f5d64cbd8bb7825f5f485013d.csv')
minimum = []
maximum = []
month = []
df1 = df1[~(df1['Date'].str.endswith(r'02-29'))]
times1 = pd.DatetimeIndex(df1['Date'])
df = df1[times1.year != 2015]
times = pd.DatetimeIndex(df['Date'])
for j in df.groupby([times.month, times.day]):
minimum.append(min(j[1]['Data_Value']))
maximum.append(max(j[1]['Data_Value']))
Explanation
pandas.groupby returns a list of tuples, (key, dataframe). Key is the groupby key; the key value of that group. See below for example.
Looping over these j
's, means looping over these tuples.
['Data_Value']
takes a column of that dataframe. Example
df = pd.DataFrame({'a': [1, 1, 2], 'b': [2, 4, 6]})
df_grouped = df.groupby('a')
for j in df_grouped:
print(f"Groupby key (col a): {j[0]}")
print("dataframe:")
print(j[1])
Yields:
Groupby key (col a): 1
dataframe:
a b
0 1 2
1 1 4
Groupby key (col a): 2
dataframe:
a b
2 2 6
More readable solution
Another, more comfortable, way to get the min/max of Data_Value
for every month-day combination is this:
data_value_summary = df \
.groupby([times.month, times.day]) \
.agg({'Data_Value': [min, max]}) \
['Data_Value'] # < this removed the 2nd header from the newly created dataframe
minimum = data_value_summary['min']
maximum = data_value_summary['max']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.