官方文档 https://docs.docker.com/install/linux/docker-ce/centos/ https://ngc.nvidia.com/catalog/containers/nvidia:cuda
编译遇到的问题 环境:
cmake version 3.15.0 gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) cuda 10.0 cudnn 7 Tensorflow-gpu 1.13.1 tesla V100我的编译指令
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/usr/lib/python2.7/site-packages/tensorflow ..报错:
CMakeFiles/gemm_fp32.dir/gemm_fp32.cu.o: In function __sti____cudaRegisterAll()': tmpxft_0000054d_00000000-5_gemm_fp32.cudafe1.cpp:(.text.startup+0x15): undefined reference to __cudaRegisterLinkedBinary_44_tmpxft_0000054d_00000000_6_gemm_fp32_cpp1_ii_5cd8620e' collect2: error: ld returned 1 exit status tools/gemm_test/CMakeFiles/gemm_fp32.dir/build.make:83: recipe for target 'bin/gemm_fp32' failed解决办法:
以下CmakeList文件末尾分别加入: tools/gemm_test/CMakeLists.txt
set_target_properties(gemm_fp32 PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON) set_target_properties(gemm_fp16 PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON)fastertransformer/cuda/CMakeLists.txt
set_target_properties(fastertransformer PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON)fastertransformer/tf_op/CMakeLists.txt
set_target_properties(tf_fastertransformer PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON)报错:
/data/app/DeepLearningExamples/FasterTransformer/fastertransformer/common.h:76:11: error: ‘runtime_error’ is not a member of ‘std’ throw std::runtime_error(std::string("[FT][ERROR] CUDA runtime error: ") + \解决办法:
//加头文件 #include <stdexcept>应用到bert中 生成配置文件gemm_config.in
./build/bin/gemm_fp16(32) <batch_size> <seq_len> <head_num> <size_per_head>修改modeling.py文件
#加载so文件 transformer_op_module = tf.load_op_library("/data/app/DeepLearningExamples/FasterTransformer/build/lib/libtf_fastertransformer.so") #transformer_model中相应地方加入代码 layer_output = dropout(layer_output, hidden_dropout_prob) layer_output = layer_norm(layer_output + attention_output) # calling faster transformer op trainable_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=tf.get_variable_scope().name) layer_output = transformer_op_module.bert_transformer( layer_input, layer_input, trainable_vars[0], trainable_vars[2], trainable_vars[4], trainable_vars[1], trainable_vars[3], trainable_vars[5], attention_mask, trainable_vars[6], trainable_vars[7], trainable_vars[8], trainable_vars[9], trainable_vars[10], trainable_vars[11], trainable_vars[12], trainable_vars[13], trainable_vars[14], trainable_vars[15], batch_size=64, from_seq_len=seq_length, to_seq_len=seq_length, head_num=num_attention_heads, size_per_head=attention_head_size) prev_output = layer_output all_layer_outputs.append(layer_output)注意: 1.bert-as-service不支持,因为bert-as-service必须是python3,fastertransformer必须是python2.7 2.支持预测,但是训练不行,我测试是这样