Random 502 Bad Gateway on Docker CE

For a long time I was getting random 502 errors “Bad Gateway” on my Docker instance running on a small VPS.

I run my containers behind a reverse proxy, with Let’s Encrypt.

Initially I was using Nginx, then moved on to Traefik, but the problem remained.

I was thinking it was related to the number of containers (16 for 2GB of RAM), but that was a network issue according to logs (cannot connect to host).

The phenomenon was going away when I was stopping the Nextcloud container.

I had Nextcloud configured to connect to the MySQL DB container using the --link option.

While investigating, I learned the option is deprecated.

That’s only then that I realized that it is recommended to create a user defined bridge.

With a user defined bridge, you automatically get name resolution.

After moving all my containers to the new bridge, I no longer have issues.

Conclusion: don’t put your container in the default bridge, create a user defined bridge, and don’t use the --link option.

Edit September 2019:

Since this article I have moved VPS provider (because I was still getting random network issues with Scaleway).

All went fine for about 6 months until I started getting random network connectivity issues again on the new VPS provider.

I started investigating by running simple pings between containers in the same subnet, and I was getting ICMP redirects!

Basically I was running pings between A and B, and every now and then C would answer “hey, the shortest past to B is to talk directly to B”.

I use Portainer to manage my containers, and Portainer is affected by a very severe bug, where it can create several containers with the same MAC address.

See https://github.com/portainer/portainer/issues/1645 for more info.

Thanks for reading this post!

If you found an issue in this article, you can create an issue on Github.

If you have a comment or question, please drop me a line below!