To resolve the problems I was having yesterday I ended up paying for an Amazon EC2 instance with the Deep Learning Ubuntu AMI. The instance type is p2.xlarge which costs $0.90/hour, but seems to be well worth it so far. In the last ten minutes I've been training a relatively small model on Google Cloud, which has been able to get through 60 steps. In contrast, on the EC2 instance the much larger model, training on the same data, has gone through 375 steps, where each epoch is 687 steps.
I did have some trouble accessing TensorBoard on the EC2 instance, but was able to get it running by following the tutorial. I also got Jupyter Notebook running and accessible from the outside world, again by following the tutorial, although I had to comment out the lines about the SSL certificates in the jupyter conf file in order to be able to connect. I decided to not use Jupyter Notebook, but it's nice to have it as an option.
Since this is just a project I am working on for myself, I'd prefer to not have to pay for the compute, but $0.90 per hour is manageable, and well worth it for the 10x increase in training speed.
Labels:
machine_learning,
tensorflow,
google_cloud,
ec2
No comments