简体   繁体   English

通过预处理提高神经网络的准确性

[英]Improving accuracy of neural network through preprocessing

Reading https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607 阅读https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607

states to debug a neural network check following : 调试神经网络的状态如下:

  1. Is the relationship between input and output too random? 输入和输出之间的关系是否太随意了? Maybe the non-random part of the relationship between the input and output is too small compared to the random part (one could argue that stock prices are like this). 与随机部分相比,输入和输出之间关系的非随机部分可能太小(可能会认为股票价格是这样的)。 Ie the input are not sufficiently related to the output. 也就是说,输入与输出之间的关联度不够。 There isn't an universal way to detect this as it depends on the nature of the data. 没有一种通用的检测方法,因为它取决于数据的性质。

To check this I wrote below code : 为了检查这一点,我写了下面的代码:

my dataframe : 我的数据框:

columns = ['A','B']
data = np.array([[1,2] , [1,5], [2,3], [2,3]])
df = pd.DataFrame(data,columns=columns)
df
    A   B
0   1   2
1   1   5
2   2   3
3   2   3

Where A is input variable and B is target variable. 其中A是输入变量,B是目标变量。

Code which measures predictive power for label 1 : 测量标签1的预测能力的代码:

df_sub1 = df[df['A'] == 1] 
len(df_sub1['A'].unique()) / len(df_sub1['B'].unique())

Value returned is 0.5 as for label 1 there are two different target values.. 返回值是0.5,因为标签1有两个不同的目标值。

Code which measures predictive power for label 2 : 测量标签2的预测能力的代码:

df_sub1 = df[df['A'] == 2] 
len(df_sub1['A'].unique()) / len(df_sub1['B'].unique())

Value returned is 1 as for label 2 both target values are same. 对于标签2,返回的值是1,两个目标值都相同。

From this can reason that attribute 1 is a better predictor than attribute 2 ? 由此可以推断出属性1比属性2更好的预测因子? I created this from reading above "Is the relationship ...." . 我是通过阅读上面的“是关系....”创建的。 This calculation has a title and is it a good measure of predictability ? 此计算有标题,是否可以很好地衡量可预测性?

To improve accuracy of neural network through data pre-processing can try removing values from training set where predictive power is below a pre-defined threshold value, where value is result of above calculations? 为了通过数据预处理提高神经网络的准确性,可以尝试从预测能力低于预定义阈值的训练集中删除值,而上述计算结果是该值吗?

I do not understand your quote the same way you did. 我的理解与您的理解不同。 So let's distinguish both interpretations. 因此,让我们区分两种解释。

  1. According to you, you qualify the random part of your model as a subset of your predictors (A) that leads to random outputs (B), and should be therefore removed. 根据您的说法,您将模型的随机部分限定为可导致随机输出(B)的预测变量(A)的子集,因此应将其删除。

  2. In my opinion, the quote should be interpreted as the general relationship between predictors (A) and target variables (B) 我认为报价应解释为预测变量(A)与目标变量(B)之间的一般关系

These are two different things. 这是两件事。

Interpretation 1 释义1

If you remove your set {A=1} from your prediction set, you have to remove it also from your prediction set. 如果从预测集中删除集合{A = 1},则也必须从预测集中删除它。 Basically, you will train your neural network to predict B only when A is not 1. As the outcome of B is uncertain when A = 1, your model performance is likely to increase but what if you have to cast prediction when cases A = 1 occurs? 基本上,您将训练您的神经网络仅在A不为1时预测B。由于当A = 1时B的结果不确定,因此模型性能可能会提高,但是如果在情况A = 1时必须进行预测会怎样?发生?

Indeed, you have increased accuracy but you have reduced your prediction potential to {A!=1} and the operation is only worth it if you find another model that would beat your neural network when {A=1} so that you general accuracy is higher. 确实,您已经提高了准确性,但将预测潜力降低到{A!= 1},并且只有当您发现另一个模型会在{A = 1}时击败您的神经网络时,才值得进行该操作,这样您的总体准确性为更高。 Besides, given the neural network non linear structure, it should theoretically capable of making the distinction between the two case by itself, so I have doubts on the pertinence of such approach. 此外,由于神经网络具有非线性结构,因此它在理论上应能够单独区分这两种情况,因此我对这种方法的适用性表示怀疑。

Regarding your attempt of measuring the predictive power, you must be aware that there is no predictive power without predictive method or model. 关于测量预测能力的尝试,必须知道,没有预测方法或模型就没有预测能力。 By using the unique method, you make strong assumption on the equiprobabilities of your outputs. 通过使用unique方法,您可以对输出的等概率进行强有力的假设。 How would your predictive power react with the following data? 您的预测能力如何应对以下数据?

data = np.array([[1,2] , [1,5], [2,3], [2,3], [2,3], [2,4]])
df1 = pd.DataFrame(data[:-2,:], columns=columns) # your data
df2 = pd.DataFrame(data, columns=columns) # my data

# your method applied to my data
print 1 / df2.groupby('A')['B'].nunique()

Prints 打印

A
1    0.5
2    0.5
Name: B, dtype: float64

Both value of A leads to the same predictive power but in case {A=1} outcomes are equiprobable and with {A=2}, in terms of maximum likelihood, the prediction should be 3. A的两个值都具有相同的预测能力,但如果{A = 1}的结果是等概率的,并且在{A = 2}的情况下,就最大可能性而言,预测应为3。

The main problem is that you have a model in mind to represent the predictive power that is different from the model you intend to use, ie the neural network. 主要问题在于,您有一个模型来表示预测能力,该模型与您打算使用的模型(即神经网络)不同。 So, if you want to measure the predictive power of your variable (generally or with some conditional constraint) why not simply use the model itself? 因此,如果您想测量变量的预测能力(通常或在某些条件约束下),为什么不简单地使用模型本身呢?

Otherwise, if you want to use a fast proxy to measure how the value of predictor reduces the uncertainty about a variable you have more robust metrics at your disposal, such as the information gain that is easy to implement and already use in decision trees to split node into branches. 否则,如果您想使用快速代理来衡量预测变量的值如何减少变量的不确定性,则可以使用更可靠的指标,例如易于实现且已在决策树中用于拆分的信息获取结成分支。

I let your read about it but here is an example to show how it overcomes the above problem: 我让您阅读有关它的信息,但是这里有一个示例来说明它如何解决上述问题:

# information gain method

def entropy(data):
    """Compute entropy of a set of values"""
    bin_cnt  = np.bincount(data)
    bin_prob = bin_cnt / np.sum(bin_cnt)
    entropy = - np.sum(bin_prob * np.ma.log2(bin_prob))
    return entropy

# using your data
print entropy(df1['B']) - df1.groupby('A')['B'].apply(entropy)

prints 版画

A
1    0.5
2    1.5
Name: B, dtype: float64

Showing that we have more information gain when A=2. 表明当A = 2时,我们可以获得更多信息。

# Using my data
print entropy(df2['B']) - df2.groupby('A')['B'].apply(entropy)

prints 版画

A
1    0.792481
2    0.981203
Name: B, dtype: float64

Showing that we have still more information gain when A=2. 表明当A = 2时,我们还有更多的信息获取。

Interpretation 2 释义2

the input are not sufficiently related to the output. 输入与输出没有足够的相关性。

As I mentioned, I do not believe that it should be regarded as subset of the input-output as you did but in their overall relationship. 正如我提到的那样,我不认为它应该像您所做的那样被视为输入输出的子集,而应视为它们的整体关系。 Assuming a deterministic predicted phenomenon, I see three different cases where the input and the output relationship can be weak generally: 假设有确定性的预测现象,我看到三种不同的情况,其中输入和输出关系通常会很弱:

  1. Your predictors are weak proxies of the explanatory variables of the predicted phenomenon 您的预测变量是预测现象的解释变量的弱代理
  2. Your predictors are noisy 您的预测变量很吵
  3. Your predicted phenomenon is high dimensional (explained by a lot of factor) and maybe nonlinear (ie. even more sensitive too noise, as it is more difficult to explain the process) 您预测的现象是高维的(由很多因素解释),并且可能是非线性的(即,噪声也更加敏感,因为难以解释该过程)

You may observe this 3 cases together and what you should do are the usual but challenging tasks of: finding more representative data, decomposing and denoising, reducing dimension, select a model that fit complex behaviors. 您可能会一起观察到这3种情况,并且应该做的是通常但具有挑战性的任务:找到更具代表性的数据,分解和去噪,缩小维数,选择适合复杂行为的模型。 And indeed, all these tasks ... 实际上,所有这些任务...

depends on the nature of the data 取决于数据的性质

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM