TensorFlow supports running computations on a variety of types of devices, including CPU and GPU. Originally used for display functions, GPUs were developed to scale up parallel computations using thousands of cores. This happens to match very well with how we execute the training of Machine Learning algorithms in Deep Learning Models. In python, the base ‘tensorflow’ package is for CPU only; a separate package, tensorflow-gpu is used when we plan to run on GPUs.
So why did we write this blog? When we started running tensorflow, our developers wanted to do testing on smaller datasets locally on their own machines, before pushing code to dev/test clusters with many GPUs. Most developer laptops/desktops these days come with NVidia Quadro or Geforce discrete GPUs. While there are a lot of posts that come up on a Google search, we encountered some issues that we did not see addressed on these posts. In this blog, we will outline the issues that we ran into during installation of tensorflow-gpu on Windows 10, and the solutions to each of them.
Pre-requisites for TensorFlow – https://www.tensorflow.org/install/gpu For using Tensorflow-gpu, we need to install the following software, in this order:
- Visual Studio Express or Full (Visual Studio is a pre-requisite for CUDA)
- CUDA Toolkit for Windows 10
- cuDNN . https://developer.nvidia.com/cudnn (Membership required)
- Tensorflow-gpu (python package)
We followed this post as a guide: https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
Step1: Drivers and installation
After Visual studio, we first installed Cuda toolkit for Window – cuda_10.0.130_411.31_win10.exe. Next we need cudnn. First, we installed – cudnn-10.0-windows10-x64-v7.4.2.24.zip. We used this code to test for all the components being installed successfully: https://github.com/NarenData/TensorflowWindows/blob/master/TensorflowValidate.py
We first got an error on cudnn not being available.
cuDNN failed to initialize tensorflow.
Turns out, the instructions were not every explicit, we need to set the path for ‘\bin’ folder of cuDNN in “Environmental Variables” list in Control Panel.
Once the install folder \bin was added to the path, the cudnn dll error went away
Step2: Resolve DLL Load error
However, we kept getting the error message below about Tensorflow. We tried installing the older version, cudnn-10.0-windows10-x64-v7.3.0.29.zip, but the same error message persisted.
from tensorflow.python.pywrap_tensorflow_internal import *
ImportError: DLL load failed: The specified module could not be found.
ImportError: No module named ‘_pywrap_tensorflow_internal’
We first spent some time to check if the Cuda toolkit and CUDNN install was correct, and if the DLLs were loading properly. To verify this, we used process monitor as suggested here: https://stackoverflow.com/questions/43553149/on-windows-running-import-tensorflow-generates-no-module-named-pywrap-tenso
This gave us clarity that the DLLs are loading properly, and the problem is elsewhere:
Since the DLLs are loading properly, suspicion now falls on the tensorflow-gpu package in Pycharm. We were running 1.12.0, but a new release candidate was available, 1.13.0rc. Upgrading to the latest release candidate 1.13.0rc0 from 1.12.0 resolved the issue.
After the update, the problem we finally got success from TensorflowValidate.py:
After this, we were able to run a couple of sample TensorFlow programs and make sure that it was running successfully on the GPU.
Step 3: Multiple GPU test
Many machines come with multiple GPUs, and TensorFlow can also be run on multiple GPUs. We test for this with https://github.com/NarenData/TensorflowWindows/blob/master/multigputest.py
The following errors can come up while testing with multigputest.py.
Some good news and bad news here. Looks like our CPU is the newer version, supporting AVX2, which tensorflow doesn’t support yet, it is still on AVX. It is finding the Quadro GPU and its specs – which is great.
We can also see that, despite running the internal display with two 32 inch 4K monitors, 3.3Gb out of 4Gb of VRAM is free for use by tensorflow.
Step 4: Solve CUDA Driver error
These two links suggest that we need a newer driver:
https://github.com/tensorflow/tensorflow/issues/21832
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
For windows, >= 411.31 is suggested, but we have April 2018 version installed on the machine by default:
Strange thing is that the toolkit (Installed in Step1) is supposed to install the updated driver, but this doesn’t seem to work properly.
Once the new driver is installed, we are seeing the right output. The code doesn’t work fully – it is not able to access the other GPU (Intel 630 Integrated graphics) – but we don’t really need this for running our Tensorflow code. It probably would have worked properly if the machine had 2 Nvidia GPUs.