[英]Pandas: selecting columns in a DataFrame question - e.g. row[1]['Column']
I don't understand this line of code 我不明白这行代码
minimum.append(min(j[1]['Data_Value']))
...specifically ...特别
j[1]['Data_Value']
I know the full code returns the minimum value and stores it in a list called minimum, but what does the j[1] do there? 我知道完整的代码返回最小值,并将其存储在称为“最小值”的列表中,但是j [1]在那里做什么? I've tried using other numbers to figure it out but get an error. 我尝试使用其他数字来弄清楚,但出现错误。 Is it selecting the index or something? 是选择索引还是其他?
Full code below. 完整代码如下。 Thanks! 谢谢!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
df1 = pd.read_csv('./data/C2A2_data/BinnedCsvs_d400/ed157460d30113a689e487b88dcbef1f5d64cbd8bb7825f5f485013d.csv')
minimum = []
maximum = []
month = []
df1 = df1[~(df1['Date'].str.endswith(r'02-29'))]
times1 = pd.DatetimeIndex(df1['Date'])
df = df1[times1.year != 2015]
times = pd.DatetimeIndex(df['Date'])
for j in df.groupby([times.month, times.day]):
minimum.append(min(j[1]['Data_Value']))
maximum.append(max(j[1]['Data_Value']))
Explanation 说明
pandas.groupby returns a list of tuples, (key, dataframe). pandas.groupby返回一个元组列表(键,数据框)。 Key is the groupby key; 密钥是分组密钥; the key value of that group. 该组的关键值。 See below for example. 参见以下示例。
Looping over these j
's, means looping over these tuples. 遍历这些j
意味着遍历这些元组。
['Data_Value']
takes a column of that dataframe. ['Data_Value']
占据该数据['Data_Value']
一列。 Example 例
df = pd.DataFrame({'a': [1, 1, 2], 'b': [2, 4, 6]})
df_grouped = df.groupby('a')
for j in df_grouped:
print(f"Groupby key (col a): {j[0]}")
print("dataframe:")
print(j[1])
Yields: 产量:
Groupby key (col a): 1
dataframe:
a b
0 1 2
1 1 4
Groupby key (col a): 2
dataframe:
a b
2 2 6
More readable solution 更具可读性的解决方案
Another, more comfortable, way to get the min/max of Data_Value
for every month-day combination is this: 另一种更舒适的方式来获取每个月日组合的Data_Value
的最小值/最大值是这样的:
data_value_summary = df \
.groupby([times.month, times.day]) \
.agg({'Data_Value': [min, max]}) \
['Data_Value'] # < this removed the 2nd header from the newly created dataframe
minimum = data_value_summary['min']
maximum = data_value_summary['max']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.