简体   繁体   English

熊猫:在DataFrame问题中选择列-例如row [1] ['Column']

[英]Pandas: selecting columns in a DataFrame question - e.g. row[1]['Column']

I don't understand this line of code 我不明白这行代码

minimum.append(min(j[1]['Data_Value']))

...specifically ...特别

j[1]['Data_Value']

I know the full code returns the minimum value and stores it in a list called minimum, but what does the j[1] do there? 我知道完整的代码返回最小值,并将其存储在称为“最小值”的列表中,但是j [1]在那里做什么? I've tried using other numbers to figure it out but get an error. 我尝试使用其他数字来弄清楚,但出现错误。 Is it selecting the index or something? 是选择索引还是其他?

Full code below. 完整代码如下。 Thanks! 谢谢!

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook

df1 = pd.read_csv('./data/C2A2_data/BinnedCsvs_d400/ed157460d30113a689e487b88dcbef1f5d64cbd8bb7825f5f485013d.csv')

minimum = []
maximum = []
month = []
df1 = df1[~(df1['Date'].str.endswith(r'02-29'))]
times1 = pd.DatetimeIndex(df1['Date'])


df = df1[times1.year != 2015]
times = pd.DatetimeIndex(df['Date'])
for j in df.groupby([times.month, times.day]):
    minimum.append(min(j[1]['Data_Value']))
    maximum.append(max(j[1]['Data_Value']))

Explanation 说明

pandas.groupby returns a list of tuples, (key, dataframe). pandas.groupby返回一个元组列表(键,数据框)。 Key is the groupby key; 密钥是分组密钥; the key value of that group. 该组的关键值。 See below for example. 参见以下示例。

Looping over these j 's, means looping over these tuples. 遍历这些j意味着遍历这些元组。

  • j[0] refers to the group "key" j [0]指代组“键”
  • j[1] means taking the dataframe component of that tuple. j [1]表示采用该元组的数据帧成分。 ['Data_Value'] takes a column of that dataframe. ['Data_Value']占据该数据['Data_Value']一列。

Example

df = pd.DataFrame({'a': [1, 1, 2], 'b': [2, 4, 6]})
df_grouped = df.groupby('a')

for j in df_grouped:
     print(f"Groupby key (col a): {j[0]}")
     print("dataframe:")
     print(j[1])

Yields: 产量:

Groupby key (col a): 1
dataframe:
   a  b
0  1  2
1  1  4
Groupby key (col a): 2
dataframe:
   a  b
2  2  6

More readable solution 更具可读性的解决方案

Another, more comfortable, way to get the min/max of Data_Value for every month-day combination is this: 另一种更舒适的方式来获取每个月日组合的Data_Value的最小值/最大值是这样的:

data_value_summary = df \
    .groupby([times.month, times.day]) \
    .agg({'Data_Value': [min, max]}) \
    ['Data_Value']  # < this removed the 2nd header from the newly created dataframe

minimum = data_value_summary['min']
maximum = data_value_summary['max']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 选择熊猫列“ a”,“ b”和“ e”至“ g” - Selecting pandas columns 'a', 'b' and 'e' through 'g' 如何将出生年份的熊猫数据框列转换为年龄? (例如&#39;1991&#39;-&gt; 28) - How can I convert a pandas dataframe column with birth year to age? (e.g. '1991' -> 28) 如何将功能“附加”到 Python 中的对象,例如 Pandas DataFrame? - How to "attach" functionality to objects in Python e.g. to pandas DataFrame? python pandas dataframe 填充,例如 bfill、ffill - python pandas dataframe filling e.g. bfill, ffill 添加两个 pandas dataframe 列,其区别仅在于后缀参数,例如“A_x”、“A_y”,并将这两列重命名为“A” - add two pandas dataframe columns which differs by only suffix parameter for e.g., “A_x”, “A_y” and rename these two columns addition with “A” 在独立运行的 python 脚本之间共享 python 对象(例如 Pandas Dataframe) - Sharing python objects (e.g. Pandas Dataframe) between independently running python scripts Pandas Dataframe:查找共享值的条目(例如,包含播放器的所有游戏) - Pandas Dataframe: Finding entries that share values (e.g. all games that contain a player) python中的多处理-在多个进程之间共享大对象(例如pandas数据帧) - multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes 如何将零值添加到以日期时间为索引的 Pandas 数据框,例如用于后续绘图 - How to add zero values to datetime-indexed Pandas dataframe, e.g. for subsequent graphing 是否有 Pandas 解决方案(例如:使用 numba 或 Cython)来使用索引、MultiIndexed DataFrame 进行“转换”/“应用”? - Is there a Pandas solution—e.g.: with numba, or Cython—to `transform`/`apply` with an index, a MultiIndexed DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM