TL;DR
93.8% 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b3 + Ranger
+0.2% higher than EfficientNet paper’s b3 model (note their best result was 94.7% with b7)
I found Mish EfficientNet-b3 + Ranger much easier to train and fine tune than EfficientNet-b3
EfficientNet
When the EfficientNet paper came out earlier in 2019 there was a flurry of excitement over the impressive results it the authors achieved across a range of computer vision tasks given its relatively small size. However since then the excitement died down as deep learning practitioners struggled to easily implement EfficientNet for their own tasks, citing it as finicky and difficult to fine tune…until now?
Mish
With the release of the Mish activation function paper and code, and a combination of RAdam and LookAhead called Ranger ,an explosion of activity by the community on the fast.ai forums lead to the annihilation of the previous ImageWoof accuracy records, see Less Wright’s excellent post here for the full story.
Taking inspiration from this, I decided to try EfficientNet with Mish and Ranger on EfficientNet against my long-standing CV project to push the accuracy I can achieve on the Stanford Cars dataset. Diganta Misra, the author of the Mish paper had already achieved impressive results with Mish + EfficientNet on CIFAR so I was excited to see what it could do when paired with Ranger.
Results
I started with the EfficientNet-b3 model as it seemed a fair compromise between size and performance. It was also used by the EfficientNet authors to benchmark against the Stanford Cars dataset, achieving an accuracy of 93.6%. (Their larger b7 model matched the SOTA for this dataset at 94.7%).
After trial and error I achieved 93.8% 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b3 + Ranger. Delighted with myself and my little P4000 remote machine. Here is the validation accuracy for the last 10 epochs of each run (note I used the test set as my validation set):