如何将计算结果添加到带有字典的 dataframe 中？

Question

我有以下问题：尝试将“时间”和“y_corrected”添加到新的 dataframe 时出现错误。

我需要计算一个变量“y_corrected”，并将其添加到新的 dataframe 中。 为了计算这个变量，我使用组 function 根据两个标准循环遍历数据集：文件名和处理。 最终的 dataframe 应包含文件名、处理、时间、y_corrected。

file = pd.read_excel(r'C:.....xlsx')
grouped = file.groupby(['File name', 'Treatment'])

########################################  output dataframe #####################################
new = pd.DataFrame(columns=['File name','Treatment', 'Time', 'y_corrected'])
new.columns = ['File name', 'Treatment', 'Time', 'y_corrected']

######################################## correction ########################################
for key, g in grouped:
  a = g['y'].max()
  b = g['y'].min()

  y_corrected = (g['y'] - b) / a

  row = {'File name': key[0], 'Treatment': key[1],  'Time': time[2], 'y_corrected': y_corrected[3]}
  new = new.append(row, ignore_index=True)

print(new)

这是错误：result = self.index.get_value(self, key)

Answer 1

对于快速解决方案，您可以尝试将值作为“行”object 中的列表传递，在使用 dataframe.append() 之前，我遇到了一些类似的问题。 例如：

row = {'File name': [key[0]], 'Treatment': [key[1]]....

Answer 2

因为 g['y'] 是一个系列，所以 y_corrected 也是一个等于组长度的系列。

y_corrected = (g['y'] - b) / a

y_corrected[3] 不起作用。 您应该再次遍历这些值以获取行值（我省略了“时间”，因为这似乎与问题无关）。

import pandas as pd

df = pd.DataFrame([['A', 'Z', 1.0],
                     ['A', 'Z', 0.5],
                     ['A', 'Y', 1.5],
                     ['A', 'Y', 0.5],
                     ['B', 'Z', 1.0],
                     ['B', 'Z', 0.5],
                     ['B', 'Y', 1.5],
                     ['B', 'Y', 0.5],
                     ],
                    columns=['File name', 'Treatment', 'y']
                    )
grouped = df.groupby(['File name', 'Treatment'])

########################################  output dataframe #####################################
new = pd.DataFrame(columns=['File name', 'Treatment', 'y_corrected'])
new.columns = ['File name', 'Treatment', 'y_corrected']

######################################## correction ########################################
for key, g in grouped:
    a = g['y'].max()
    b = g['y'].min()

    y_corrected = (g['y'] - b) / a

    for idx, y_corrected_ in y_corrected.items():
        row = {'File name': key[0], 'Treatment': key[1], 'y_corrected': y_corrected_}
        new = new.append(row, ignore_index=True)

解决此问题的更简单方法是直接在您的组上执行操作。

def correction(s):
    return (s - s.min()) / s.max()

df['y_corrected'] = grouped.apply(correction)

print(df)

给出：

  File name Treatment    y  y_corrected
0         A         Z  1.0     0.500000
1         A         Z  0.5     0.000000
2         A         Y  1.5     0.666667
3         A         Y  0.5     0.000000
4         B         Z  1.0     0.500000
5         B         Z  0.5     0.000000
6         B         Y  1.5     0.666667
7         B         Y  0.5     0.000000

Answer 3

您不必遍历不同的组。 您只需在 dataframe 上使用 pandas 魔法：

file = pd.read_excel(r'C:.....xlsx')

file['y_corrected'] = file.groupby(['File name', 'Treatment'])['y'].apply(lambda x: (x-min(x))/max(x))

如何将计算结果添加到带有字典的 dataframe 中？

问题描述

3 个解决方案

解决方案1
0 2021-04-26 15:24:45

解决方案2
0 2021-04-26 15:28:34

解决方案3
0 已采纳 2021-04-26 15:35:40

如何将计算结果添加到带有字典的 dataframe 中？

问题描述

3 个解决方案

解决方案1 0 2021-04-26 15:24:45

解决方案2 0 2021-04-26 15:28:34

解决方案3 0 已采纳 2021-04-26 15:35:40

解决方案1
0 2021-04-26 15:24:45

解决方案2
0 2021-04-26 15:28:34

解决方案3
0 已采纳 2021-04-26 15:35:40