简体   繁体   English

如何处理用于循环神经网络 (RNN) 的面板数据

[英]How to process panel data for use in a recurrent neural network (RNN)

I have been doing some research on recurrent neural networks, but I am having trouble understanding if and how they could be used to analyze panel data (meaning cross-sectional data that is captured at different periods in time for several subjects -- see sample data below for example).Most examples of RNNs I have seen have to do with sequences of text, rather than true panel data, so I'm not sure if they are applicable to this type of data.我一直在对循环神经网络进行一些研究,但我无法理解它们是否以及如何用于分析面板数据(意味着在不同时间段为多个主题捕获的横截面数据 - 请参阅示例数据例如下面)。我见过的大多数 RNN 示例都与文本序列有关,而不是真正的面板数据,因此我不确定它们是否适用于此类数据。

Sample data:样本数据:

ID    TIME    Y    X1    X2    X3
1     1       5     3     0    10
1     2       5     2     2    6
1     3       6     6     3    11
2     1       2     2     7    2
2     2       3     3     1    19
2     3       3     8     6    1
3     1       7     0     2    0

If I want to predict Y at a particular time given the covariates X1, X2 and X3 (as well as their values in previous time periods), can this kind of sequence be evaluated by a recurrent neural network?如果我想在给定协变量 X1、X2 和 X3(以​​及它们在之前时间段内的值)的特定时间预测 Y,这种序列可以由循环神经网络评估吗? If so, do you have any resources or ideas on how to turn this type of data into feature vectors and matching labels that can be passed to an RNN (I'm using Python, but am open to other implementations).如果是这样,您是否有关于如何将此类数据转换为可以传递给 RNN 的特征向量和匹配标签的任何资源或想法(我使用的是 Python,但对其他实现持开放态度)。

I also was looking at this question and so far I've only found this paper which seems to deal with it.我也在看这个问题,到目前为止我只找到了这篇似乎处理它的论文。

Tensorial Recurrent Neural Networks for Longitudinal Data Analysis Mingyuan Bai, Boyan Zhang and Junbin Gao 2017用于纵向数据分析的张量递归神经网络 Mingyuan Bai、Boyan Zhang 和 Junbin Gao 2017

I hope this helps我希望这会有所帮助

Please, see this post .请看这篇文章

It answers your concerning about neural networks and panel data.它回答了您对神经网络和面板数据的担忧。

TSAI (based on fastai) https://timeseriesai.github.io/tsai/data.preparation.html#SlidingWindowPanel offers a panel data preprataion function which might be of use for you. TSAI(基于 fastai) https://timeseriesai.github.io/tsai/data.preparation.html#SlidingWindowPanel提供了一个面板数据准备功能,可能对您有用。

FYI: it has some great SOTA algoithms for time series classification & regression.仅供参考:它有一些很棒的 SOTA 算法用于时间序列分类和回归。

I find no reason in being able to train neural network with panel data.我发现没有理由能够用面板数据训练神经网络。 What neural network does is that it maps one set of values with other set of values who have non-linear relation.神经网络所做的是将一组值与具有非线性关系的另一组值进行映射。 In a time series a value at a particular instance depends on previous occuring values.在时间序列中,特定实例的值取决于先前出现的值。 Example: your pronunciation of a letter may vary depending on what letter you pronounced just earlier.示例:您对一个字母的发音可能会因您之前发音的字母而异。 For time series prediction Recurrent Neural Network outperforms feed-forward neural networks.对于时间序列预测,循环神经网络优于前馈神经网络。 How we train time series with a regular feed-forward network is illustrated in this picture.这张图片说明了我们如何使用常规前馈网络训练时间序列。 Image 图片

In RNN we can create a feedback loop in the internal states of the network and that's why RNN is better at predicting time series.在 RNN 中,我们可以在网络的内部状态中创建一个反馈循环,这就是 RNN 更擅长预测时间序列的原因。 In your example data one thing to consider : do values of x1, x2, x3 have effect on y1 or vice-versa ?在您的示例数据中,需要考虑一件事:x1、x2、x3 的值是否对 y1 有影响,反之亦然? If it doesn't then you can train your model as such x1,x2,x3, y4 are same type of data ie train them independently using same network (subject to experimentation).如果不是,那么您可以训练模型,因为 x1、x2、x3、y4 是相同类型的数据,即使用相同的网络独立训练它们(取决于实验)。 If your target is to predict a value where their values of one has effect on another ie correlated you can convert them to one dimensional data where single time frame contains all variants of sample type.如果您的目标是预测一个值,其中一个值对另一个值有影响,即相关,您可以将它们转换为一维数据,其中单个时间范围包含样本类型的所有变体。 Another way might be train four neural networks where first three map their time series using RNN and last one is a feed-forward network which takes 2 inputs from 2 time series output and maps to 3rd time series output and do this for all possible combinations.另一种方法可能是训练四个神经网络,其中前三个使用 RNN 映射它们的时间序列,最后一个是前馈网络,它从 2 个时间序列输出中获取 2 个输入并映射到第 3 个时间序列输出,并对所有可能的组合执行此操作。 (still subject to experimentation as we can't surely predict the performance of neural network model without experimenting) (仍需进行实验,因为我们无法在不进行实验的情况下确定预测神经网络模型的性能)

Reading suggestion: Read about "Granger causality", might help you a bit.阅读建议:阅读“格兰杰因果关系”,可能会对你有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM