[英]caffe error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
Recently,I need to change the code of softmax_loss_layer in caffe.最近需要修改caffe中softmax_loss_layer的代码。 The caffe version is: https://github.com/BVLC/caffe
caffe 版本为: https : //github.com/BVLC/caffe
The code between ... is my code,and the rest is the same.Here is how I modified the code First,I added rs_ in loss_layers.hpp ...之间的代码是我的代码,其余的都是一样的。这里是我修改代码的方法 首先,我在loss_layers.hpp中添加了rs_
class SoftmaxWithLossLayer : public LossLayer<Dtype> {
...
Blob<Dtype> rs_;
...
}
Then,I reshape rs_ in softmax_loss_layer.cpp然后,我在 softmax_loss_layer.cpp 中重塑 rs_
void SoftmaxWithLossLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
...
rs_.Reshape(bottom[2]->num(),bottom[2]->channels(),bottom[2]->height(),
bottom[2]->width());
}
Bottom[2] is from dense_image_data_layer.cpp底部[2]来自dense_image_data_layer.cpp
template <typename Dtype>
void DenseImageDataLayer<Dtype>::DataLayerSetUp(constvector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
top[2]->Reshape(batch_size, 1, height, width);
this->prefetch_weight_.Reshape(batch_size, 1, height, width);
this->transformed_weight_.Reshape(1, 1, height, width);
}
Dtype* top_data = top[2]->mutable_cpu_data();
const int count = top[2]->count();
if(crop_size > 0) {
for(int index = 0;index < count;++index)
{
const int ww = index % crop_size;
const int wh = (index / crop_size) % crop_size;
top_data[index]=static_cast<Dtype>(cv_weight.at<uchar>(wh,ww));
}
}
else{
for(int index = 0;index < count;++index)
{
const int ww = index % width;
const int wh = (index / width) % height;
top_data[index]=static_cast<Dtype>(cv_weight.at<uchar>(wh,ww));
}
}
At last,I added this code in softmax_loss_layer.cu最后,我在 softmax_loss_layer.cu 中添加了这段代码
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Forward_gpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
...
Dtype loss;
Dtype em;
bool add_weight=this->layer_param_.loss_param().add_weight();
Dtype* weight=bottom[2]->mutable_gpu_data();
Dtype z=0;
Dtype* rs=rs_.mutable_gpu_data();
if(add_weight)
{
SoftmaxLossWeightForwardGPU<Dtype><<<CAFFE_GET_BLOCKS(nthreads),
CAFFE_CUDA_NUM_THREADS>>>(nthreads, prob_data, label,
weight_by_label_freqs_, label_count_data , loss_data,
outer_num_, dim, inner_num_,
has_ignore_label_, ignore_label_, counts,weight,rs);
const Dtype*rs1=rs_.gpu_data();
caffe_gpu_asum(nthreads, loss_data, &loss);
em=loss/nthreads;
count=nthreads;
Dtype am=1/2*log((1-em)/em);
CalculateZ<Dtype><<<CAFFE_GET_BLOCKS(nthreads),
CAFFE_CUDA_NUM_THREADS>>>(nthreads,weight,rs1,am,z,inner_num_);
WeightUpdate<Dtype><<<CAFFE_GET_BLOCKS(nthreads),
CAFFE_CUDA_NUM_THREADS>>>(nthreads,weight,rs1,am,z,inner_num_);
...
}
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
...
const Dtype* weight=bottom[2]->gpu_data();
bool add_weight=this->layer_param_.loss_param().add_weight();
if(add_weight)
SoftmaxLossWeightBackwardGPU<Dtype><<<CAFFE_GET_BLOCKS(nthreads),
CAFFE_CUDA_NUM_THREADS>>>(nthreads, top_data, label,
weight_by_label_freqs_, label_count_data, bottom_diff,
outer_num_, dim, inner_num_, has_ignore_label_,
ignore_label_, counts,weight);
...
}
The error is:错误是:
F0108 10:45:48.291290 2859 math_functions.cu:81] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
@ 0x7f0b3bcfbdaa (unknown)
@ 0x7f0b3bcfbce4 (unknown)
@ 0x7f0b3bcfb6e6 (unknown)
@ 0x7f0b3bcfe687 (unknown)
@ 0x7f0b3c16dd48 caffe::caffe_gpu_memcpy()
@ 0x7f0b3c09b67e caffe::SyncedMemory::gpu_data()
@ 0x7f0b3c054472 caffe::Blob<>::gpu_data()
@ 0x7f0b3c0b0568 caffe::Net<>::ForwardFromTo()
@ 0x7f0b3c0b0947 caffe::Net<>::ForwardPrefilled()
@ 0x7f0b3c08e555 caffe::Solver<>::Step()
@ 0x7f0b3c08ee8f caffe::Solver<>::Solve()
@ 0x407806 train()
@ 0x405d41 main
@ 0x7f0b3b20dec5 (unknown)
@ 0x4062ed (unknown)
@ (nil) (unknown)
I think that this error is because I didn't use rs_ correctly or bottom[2] is not right.我认为这个错误是因为我没有正确使用 rs_ 或者 bottom[2] 不正确。 All the code that I changed is posted above,so can anybody tell me what to do?
我更改的所有代码都已在上面发布,所以有人可以告诉我该怎么做吗? If you need extra information,please tell me?
如果您需要更多信息,请告诉我?
It is difficult to follow your code and changes.很难遵循您的代码和更改。
Some comments;一些评论;
reshape
ing rs_
should occur in reshape()
method, rather than in setup()
. reshape
rs_
应该发生在reshape()
方法中,而不是在setup()
。
It is best using rs_.ReshapeLike(*bottom[2])
than explicitly enumerating num
, channels
etc. What if you are going to have a Blob with different number of dimensions?最好使用
rs_.ReshapeLike(*bottom[2])
不是显式枚举num
、 channels
等。如果您要拥有不同维数的 Blob 怎么办?
Have you tested your modified layer?您是否测试过修改后的图层? From caffe wiki :
来自咖啡馆维基:
Write tests in
test/test_your_layer.cpp
.在
test/test_your_layer.cpp
编写测试。 Usetest/test_gradient_check_util.hpp
to check that your Forward and Backward implementations are in numerical agreement.使用
test/test_gradient_check_util.hpp
来检查您的 Forward 和 Backward 实现在数字上是否一致。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.