非死book开发的Caffe工具箱:fb-caffe-exts
【开源:非死book开发的Caffe工具箱fb-caffe-exts】predictor:C++多线程权制共享常用模式封装 torch2caffe:将Torch模型文件转换成Caffe模型 conversions:Caffe网络转换命令行工具。
predictor/
A simple C++ library that wraps the common pattern of running acaffe::Netin multiple threads while sharing weights. It also provides a slightly more convenient usage API for the inference case.
#include "caffe/predictor/Predictor.h" // In your setup phase predictor_ = folly::make_unique<caffe::fb::Predictor>(FLAGS_prototxt_path, FLAGS_weights_path); // When calling in a worker thread static thread_local caffe::Blob<float> input_blob; input_blob.set_cpu_data(input_data); // avoid the copy. const auto& output_blobs = predictor_->forward({&input_blob}); return output_blobs[FLAGS_output_layer_name];
Of note is thepredictor/Optimize.{h,cpp}, which optimizes memory usage by automatically reusing the intermediate activations when this is safe. This reduces the amount of memory required for intermediate activations by around 50% for AlexNet-style models, and around 75% for GoogLeNet-style models.
We can plot each set of activations in the topological ordering of the network, with a unique color for each reused activation buffer, with the height of the blob proportional to the size of the buffer.
For example, in an AlexNet-like model, the allocation looks like
A corresponding allocation for GoogLeNet looks like
The idea is essentially linear scan register allocation. We
- compute a set of “live ranges” for eachcaffe::SyncedMemory(due to sharing, we can’t do this at acaffe::Bloblevel)
- compute a set of live intervals, and schedule eachcaffe::SyncedMemoryin a non-overlapping fashion onto each live interval
- allocate a canonicalcaffe::SyncedMemorybuffer for each live interval
- Update the blob internal pointers to point to the canonical buffer
Depending on the model, the buffer reuse can also lead to some non-trivial performance improvements at inference time.
To enable this just passPredictor::Optimization::MEMORYto thePredictorconstructor.