Pandas: selecting columns in a DataFrame question - e.g. row[1]['Column']

Question

I don't understand this line of code

minimum.append(min(j[1]['Data_Value']))

...specifically

j[1]['Data_Value']

I know the full code returns the minimum value and stores it in a list called minimum, but what does the j[1] do there? I've tried using other numbers to figure it out but get an error. Is it selecting the index or something?

Full code below. Thanks!

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook

df1 = pd.read_csv('./data/C2A2_data/BinnedCsvs_d400/ed157460d30113a689e487b88dcbef1f5d64cbd8bb7825f5f485013d.csv')

minimum = []
maximum = []
month = []
df1 = df1[~(df1['Date'].str.endswith(r'02-29'))]
times1 = pd.DatetimeIndex(df1['Date'])


df = df1[times1.year != 2015]
times = pd.DatetimeIndex(df['Date'])
for j in df.groupby([times.month, times.day]):
    minimum.append(min(j[1]['Data_Value']))
    maximum.append(max(j[1]['Data_Value']))

Answer 1

Explanation

pandas.groupby returns a list of tuples, (key, dataframe). Key is the groupby key; the key value of that group. See below for example.

Looping over these j 's, means looping over these tuples.

j[0] refers to the group "key"
j[1] means taking the dataframe component of that tuple. ['Data_Value'] takes a column of that dataframe.

Example

df = pd.DataFrame({'a': [1, 1, 2], 'b': [2, 4, 6]})
df_grouped = df.groupby('a')

for j in df_grouped:
     print(f"Groupby key (col a): {j[0]}")
     print("dataframe:")
     print(j[1])

Yields:

Groupby key (col a): 1
dataframe:
   a  b
0  1  2
1  1  4
Groupby key (col a): 2
dataframe:
   a  b
2  2  6

More readable solution

Another, more comfortable, way to get the min/max of Data_Value for every month-day combination is this:

data_value_summary = df \
    .groupby([times.month, times.day]) \
    .agg({'Data_Value': [min, max]}) \
    ['Data_Value']  # < this removed the 2nd header from the newly created dataframe

minimum = data_value_summary['min']
maximum = data_value_summary['max']

Pandas: selecting columns in a DataFrame question - e.g. row[1]['Column']

Question

1 answers

solution1
2 ACCPTED 2019-08-14 10:34:07

Pandas: selecting columns in a DataFrame question - e.g. row[1]['Column']

Question

1 answers

solution1 2 ACCPTED 2019-08-14 10:34:07

solution1
2 ACCPTED 2019-08-14 10:34:07