jeudi 29 mars 2018 à 10:06

TensorFlow and Google Cloud GPU Instances

Par Eric Antoine Scuccimarra

I decided to try a Google Cloud GPU instance as well as EC2. Once I had my quotas set properly and was able to start the instance it took me all day to get TensorFlow running with GPU. The instructions Google provides are for CUDA 8.0, and the latest version of TensorFlow requires CUDA 9.0.

To get everything running follow these steps:

  1. curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
  2. sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
  3. sudo apt-get update
  4. sudo apt-get install cuda-9-0
  5. sudo nvidia-smi -pm 1

These are the steps in the instructions with the proper repo to CUDA 9.0 inserted.

Then I had to install cudnn, which isn't mentioned at all in Google's instructions. I downloaded libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb from the Nvidia cudnn site, and then uploaded it to the instance with scp. Then install it with:

sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb

Then you need to export the path with:

echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

And finally install TensorFlow:

sudo apt-get install python-dev python-pip libcupti-dev
sudo pip install tensorflow-gpu

I used pip3 and python3, but the rest is the same. 

Update: I thought it was working fine but I was still getting errors about locating libcupti.so.9.0. That was fixed by making symlinks as described here.

I ran these commands and now it seems to be working...

  1. # Put symlinks in /usr/local/cuda
  2. sudo mkdir /usr/local/cuda
  3. cd /usr/local/cuda
  4. sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
  5. sudo ln -s /usr/include/ include
  6. sudo ln -s /usr/bin/ bin
  7. sudo ln -s /usr/lib/x86_64-linux-gnu/ nvvm
  8. sudo mkdir -p extras/CUPTI
  9. cd extras/CUPTI
  10. sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
  11. sudo ln -s /usr/include/ include

Another Update: TensorFlow requires version 7.0.4 of the cudnn, I had originally downloaded 7.1.2, the code has been updated accordingly.

Final Update: I set up another instance and followed this process and it almost worked. I needed to export another path which I added here. The commands to export the path were temporary and had to be repeated every time the instance was booted, I changed that to echo the path to .bashrc so it would be automatically set.

Libellés: coding, machine_learning, tensorflow, google_cloud

Comments

Please login to comment