Les billets libellés: python. Afficher tous les billets.

PyTorch

lundi 08 avril 2019

When I first started with neural networks I learned them with TensorFlow and it seemed like TensorFlow was pretty much the industry standard. I did however keep hearing about PyTorch which was supposedly better than TensorFlow in many ways, but I never really got around to learning it. Last week I had to do one of my assignments in PyTorch so I finally got around to it, and I am already impressed.

The biggest problem I always had with TensorFlow was that the graphs are static. The entire graph must be defined and compiled before it is run and it can't be altered at runtime. You feed data into the graph and it returns output. This results in the rather awkward tf.Session() which must be created before you can do anything, and which contains all of the parameters for the model.

PyTorch has dynamic graphs which are compiled at runtime. This means that you can change things as you go, including altering the graph while it is running, and you don't need to have all the dimensions of all of the data specified in advance like you do in TensorFlow. You can also do things like change the numbers of neurons in a layer dynamically and drop entire layers at runtime which you can't do with TensorFlow.

Debugging PyTorch is a lot easier since you can just make a change and test it - you don't need to recreate the graph and instantiate a session to test it out. You can just run an optimization step whenever you want. Coming from TensorFlow that is just a breath of fresh air.

TensorFlow still has many advantages, including the fact that it is still an industry standard, is easier to deploy and is better supported. But PyTorch is definitely a worth competitor, is far more flexible, and solves many of the problems with TensorFlow.

Libellés: python, machine_learning, tensorflow, pytorch
Aucun commentaire

Exercise Log

mardi 27 novembre 2018

I exercise quite a lot and I have not been able to find an app to keep track of it which satisfies all of my criteria. Most fitness trackers are geared towards cardio and I also do a lot of strength training. After spending a year trying to make due with combinations of various fitness trackers and other apps I decided to just write my own, which could do everything I wanted and could show all of the reports I wanted.

I did that and after using it for a few weeks put it online at workout-log.com. It's not fancy and it is quite likely very buggy at this point, but it is open to anyone who wants to use it. 

It's written with Django and jQuery and uses ChartJS for the charts. 

Libellés: python, django, data_science, machine_learning
1 commentaires

I have previously written about Google CoLab which is a way to access Nvidia K80 GPUs for free, but only for 12 hours at a time. After a few months of using Google Cloud instances with GPUs I have run up a substantial bill and have reverted to using CoLab whenever possible. The main problem with CoLab is that the instance is terminated after 12 hours taking all files with it, so in order to use them you need to save your files somewhere.

Until recently I had been saving my files to Google Drive with this method, but while it is easy to save files to Drive it is much more difficult to read them back. As far as I can tell, in order to do this with the API you need to get the file id from Drive and even then it is not so straightforward to upload the files to CoLab. To deal with this I had been uploading files that needed to be accessed often to an AWS S3 bucket and then downloading them to CoLab with wget, which works fine, but there is a much simpler way to do the same thing by using Google Cloud Storage instead of S3.

First you need to authenticate CoLab to your Google account with:

from google.colab import auth

auth.authenticate_user()

Once this is done you need to set your project and bucket name and then update the gcloud config.
project_id = [project_name]
bucket_name = [bucket_name]
!gcloud config set project {project_id}

After this has been done files can simply and quickly be upload or downloaded from the bucket with the following simple commands:

# download
!gsutil cp gs://{bucket_name}/foo.bar ./foo.bar

# upload
!gsutil cp  ./foo.bar gs://{bucket_name}/foo.bar

I actually have been adding the line to upload the weights to GCS to my training code so it is automatically uploaded every couple epochs, which removes the need for me to manually back them up periodically throughout the day.

Libellés: coding, python, machine_learning, google, google_cloud
1 commentaires

IOU Loss

mercredi 29 août 2018

When doing binary image segmentation, segmenting images into foreground and background, cross entropy is far from ideal as a loss function. As these datasets tend to be highly unbalanced, with far more background pixels than foreground, the model will usually score best by predicting everything as background. I have confronted this issue during my work with mammography and my solution was to use a weighted sigmoid cross entropy loss function giving the foreground pixels higher weight than the background.

While this worked it was far from ideal, for one thing it introduced another hyperparameters - the weight - and altering the weight had a large impact on the model. Higher weights favored predicting pixels as positive, increasing recall and decreasing precision, and lowering the weight had the opposite effect. When training my models I usually began with a high weight to encourage the model to make positive predictions and gradually decayed the weight to encourage it to make negative predictions.

For these types of segmentation tasks Intersection over Union tends to be the most relevant metric as pixel level accuracy, precision and recall do not account for the overlap between predictions and ground truth. Especially for this task, where overlap can be the difference between life and death for the patient, accuracy is not as relevant as IOU. So why not use IOU as a loss function?

The reason was because IOU was not differentiable so can not be used for gradient descent. However Wang et al have written a paper - Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation - which provides an easy way to use IOU as a loss function. In addition, this site provides code to implement this loss function in TensorFlow.

The essence of this method is that rather than using the binary predictions to calculate IOU we use the sigmoid probability output by the logits to estimate it which allows IOU to provide gradients. At first I was skeptical of this method, mostly because I understood cross entropy better and it is more common, but after I hit a performance wall with my mammography models I decided to give it a try.

My models using cross-entropy loss had ceased to improve validation performance so I switched the loss function and trained them for a few more epochs. The validation metrics began to improve, so I decided to train a copy of the model from scratch with the IOU loss. This has been a resounding success. The IOU loss accounts for the imbalanced data, eliminating the need to weight the cross entropy. With the cross entropy loss the models usually began with recall of near 1 and precision of near 0 and then the precision would increase while the recall slowly decreased until it plateaued. With IOU loss they both start near 0 and gradually increase, which to me seems more natural. 

Training with an IOU loss has two concrete benefits for this task - it has allowed the model to detect more subtle abnormalities which models trained with cross entropy loss did not detect; and it has reduced the number of false positives significantly. As the false positives are on a pixel level this effectively means that the predictions are less noisy and the shapes are more accurate.

The biggest benefit is that we are directly optimizing for our target metric rather than attempting to use an imperfect substitute which we hope will approximate the target metric. Note that this method only works for binary segmentation at the moment. It also is a bit slower than using cross entropy, but if you are doing binary segmentation the performance boost is well worth it.

 

Libellés: python, machine_learning, mammography, convnets
Aucun commentaire

As I continue to work on my mammography project I save a lot of time by re-using weights from models I have already trained rather than training every iteration of every model from scratch, which would be very time consuming. However a drawback to this method is that if I add a new layer or change a layer when I continue training the model the layers which have not changed are prone to overfit as they have been trained for substantially longer than the new layers.

I tried only training certain variables, but when the checkpoint is saved only the trained variables are included in it, which means that the checkpoint can not be restored as it is missing many variables. This could be overcome by restoring certain variables from one checkpoint and others from a different checkpoint, but that is overly complicated and not very convenient.

Earlier today, I had added another deconvolution layer to my model. When I trained just that layer the accuracy of the model went very high very quickly, much more quickly than training all of the layers. But then I couldn't continue training all of the layers because the checkpoint only contained the layer trained. I don't have the time to retrain the entire monstrosity from scratch, so I found an ugly hack that allows me to train mostly the layers I want to train while saving all of the weights in the checkpoint.

I create two training ops - one for all variables (train_op_1) and one for the variables I want to train (train_op_2). I run train_op_2 most of the time. But right before I save the checkpoint I do one iteration of train_op_1 which updates all layers, so all variables are saved in the checkpoint. It's not pretty, but it works and best of all, the code doesn't have to be changed depending on what I want to train. I specify whether I want to train all vars or just the subset as a command line arg and if I want to train all vars, then set train_op_2 = train_op_1.

I just ran a few quick tests with no issues, hopefully this will continue to work.

Libellés: python, data_science, machine_learning, tensorflow
Aucun commentaire