在python中分类和恢复数据

Question

I have a dataset in which resides in a 13 by 506 matrix, let's call the data set data_1.我有一个位于 13 x 506 矩阵中的数据集，我们将数据集称为 data_1。 I am interested in one of the columns data, lets call that data column data_c1.我对列数据之一感兴趣，让我们称该数据列为 data_c1。 Data_c1 is numeric, so the 50th percentile can be calculated with the numpy library. Data_c1 是数字，因此可以使用 numpy 库计算第 50 个百分位数。

My goal is to go through data_c1, do a binary classification on whether it is above or below the 50th percentile (y=1 for above, y=0 for below) and store that information in a new matrix with the corresponding tag (y=1 or y=0.)我的目标是通过 data_c1，对它是高于还是低于第 50 个百分位（y=1 表示以上，y=0 表示以下）进行二元分类，并将该信息存储在具有相应标签的新矩阵中（y= 1 或 y=0。）

I figured out how to load the data and calculate t50 (see below.) Can someone show me how to complete the reclassification?我想出了如何加载数据并计算 t50（见下文）。有人可以告诉我如何完成重新分类吗？ I think I would need to use a while loop, but I can't get it to restore the data into a new matrix.我想我需要使用 while 循环，但我无法将数据恢复到新矩阵中。

Here is my code so far:到目前为止，这是我的代码：

#import libraries
import numpy as np
import pandas as pd

#import data set
from datasoure import data_file
data_file = data_1()
data_1['data_c1'] = data_c1

#calculate percentile using numpy
t50 = np.percentile(data_1, 50)

#classify target data as y=1 for >=t50 or <=t50
#while loop????

Answer 1

You can apply a function like this:您可以应用这样的函数：

def classifier(row):
         global t50 #defined somewhere else
         if row["data_c1"] > t50:
                 return 1
         else:
                 return 0
     
 new_col = df.apply(classifier, axis=1)

Then you can do whatever you want with new_col然后你可以用new_col做任何你想做的new_col

在python中分类和恢复数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-01 16:12:06

在python中分类和恢复数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-01 16:12:06

解决方案1
0 已采纳 2020-10-01 16:12:06