Amazon EC2 Deep Learning AMI InstanceBy Eric Antoine Scuccimarra
It is difficult to play around with the structure for the GAN I am working on in Colab since it trains so slowly. I can usually get maybe 2 or 3 epochs in a day, which means that I need to wait a day before evaluating each change I make. I decided to rent a GPU in the cloud for a few days so I could train it a bit more quickly and figure out what works and what doesn't work before going back to Colab.
I already have a Google Cloud GPU instance I was using for my work with mammography, but it was running CUDA 9.0 which apparently is not supported by PyTorch out of the box. I tried to upgrade CUDA to 10, but I think I ended up just making things worse. Rather than spend a whole day trying to fix the GCS instance, and since I have some AWS credits, I decided to try to use an AWS Deep Learning AMI instance, which already has everything configured.
It was incredibly easy to get set up, it comes pre-configured with virtual environments for different deep learning frameworks and packages, so there is no need to install CUDA or drivers or anything like that, which is a huge advantage, since back when I was setting up the GCS instance it took me a few days to get everything installed and working. One thing I quickly noticed was that the default disk size was not even close to big enough - after downloading a few data files I was already running out of disk space, but it was very easy to increase the disk size.
Then all I had to do was activate the pytorch environment, launch a notebook and everything was running smoothly. I did run into a few minor issues, none of which were difficult to resolve:
- If I launch tmux from within a virtual environment it launches a session that does NOT have the environment activated. Then if I activate the environment from within tmux it doesn't have access to the proper modules. This was resolved by launching tmux from outside of the venv, and then activating the venv from inside tmux.
- In my notebook it didn't seem to have access to pytorch, but this was because I hadn't selected the proper kernel from the kernel -> change kernel menu. I wasn't even aware that one could select the kernel like that.
I used to prefer GCS to AWS because it was more configurable and easier to use. While AWS does have a bit of a learning curve, they really have thought of and provided for just about every possible contingency. We use AWS at my work, and it really is very impressive. I still like the simplicity of GCS, but even simple things like AMIs make such a huge difference in set-up time that I think I'll be using AWS more often now.
Labels: machine_learning, aws, pytorch