nginx Cannot Bind in Docker Container
The problem my team faced was that a container running nginx was starting but the error message was that the port was already in use.
[emerg] 1#1: bind() to 0.0.0.0:8002 failed (98: Address already in use)
The salient configuration of the container was,
$ docker inspect container-with-problem | jq '.[] | {"cmd": .Config.Cmd, "network_mode": .HostConfig.NetworkMode, "port_bindings": .HostConfig.PortBindings, "exposed_ports": .Config.ExposedPorts, "ports": .NetworkSettings.Ports, "bridge": .NetworkSettings.Bridge, "ip_address": .NetworkSettings.IPAddress, "networks": .NetworkSettings.Networks, "volume_mounts": .Mounts}' { "cmd": [ "bash", "-c", "nginx -c /etc/nginx/nginx.conf;" ], "network_mode": "host", "port_bindings": null, "exposed_ports": { "8002/tcp": {} }, "ports": {}, "bridge": "", "ip_address": "", "networks": {}, "volume_mounts": [ { "Type": "bind", "Source": "REDACTED", "Destination": "REDACTED", "Mode": "ro", "RW": false, "Propagation": "rprivate" }, { "Type": "bind", "Source": "REDACTED", "Destination": "REDACTED", "Mode": "ro", "RW": false, "Propagation": "rprivate" }, { "Type": "bind", "Source": "REDACTED", "Destination": "REDACTED", "Mode": "rw", "RW": true, "Propagation": "rprivate" } ] }
I checked who was using port 8002,
$ ss -tplan | grep :8002 | awk '{print $4}' | sort | uniq 10.A.B.6:8002 127.0.0.1:7000 127.0.0.1:8001
I wasn't expecting 10.A.B.6:8002. Instead, I expected 127.0.0.1:8002. 10.A.B.6 was the host IP address. The container should bind to localhost not the host address.
I looked at the open files,
$ sudo lsof -i :8002 lsof: no pwd entry for UID 101 lsof: no pwd entry for UID 101 lsof: no pwd entry for UID 101 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof: no pwd entry for UID 101 nginx 6510 101 25u IPv4 306195896 0t0 TCP redacted.redacted.redacted.com:teradataordbms->redacted.redacted.redacted.com:redactedport (ESTABLISHED) nginx 161218 systemd-coredump 2387u IPv4 306167339 0t0 TCP redacted.redacted.redacted.com:teradataordbms->redacted.redacted.redacted.com:redactedport (ESTABLISHED) lsof: no pwd entry for UID 101 nginx 4001159 101 71u IPv4 306211025 0t0 TCP redacted.redacted.redacted.com:teradataordbms->redacted.redacted.redacted.com:redactedport (ESTABLISHED) lsof: no pwd entry for UID 101 nginx 4001159 101 154u IPv4 306211044 0t0 TCP redacted.redacted.redacted.com:teradataordbms->redacted.redacted.redacted.com:redactedport (ESTABLISHED) nginx 4179792 systemd-coredump 47u IPv4 306103886 0t0 TCP redacted.redacted.redacted.com:teradataordbms->redacted.redacted.redacted.com:redactedport (ESTABLISHED)
Notice systemd-coredump was the user for the open files using port 8002 and nginx was the process that opened the file. systemd-coredump has user ID 999 and group ID 997 on the host. The container did not have that user ID or group ID at all.
$ id -u systemd-coredump 999 $ id -g systemd-coredump 997
I found a reference which I cannot substantiate from official documentation, (source: Question: Docker install on linux sets data dir owner to systemd-coredump user)
directory is owned by systemd-coredump, which apparently happens when the kernel crashes in the middle of some operation
While I don't know exactly what happened, the symptoms were:
- nginx in the container could not bind to port 8002
- the container was in restarting loop
- nginx process(es) in some container(s) held open a file on port 8002
- the nginx process(es) were not always the same whenever
lsof
was run - there were active network connections between the seemingly orphaned nginx process(es) and "upstream" servers
I had found the port conflict which was a great first step.
The next realization was that since I had not set up something to listen to port 8002 other than the container, there was some rogue (since it was unknown to me) process that was randomly using the same port as I needed for the container.
My team had run into this problem before.
$ sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 1024 65535
The range of local ports was 1024 to 65535. 8002 falls within that range. It was just randomness that resulted in something using the same port as I needed, thus the conflict.
The fix proposed by my team was to modify the port range to something we don't use for our containers.
$ sudo sysctl -w net.ipv4.ip_local_port_range="25000 65535" net.ipv4.ip_local_port_range = 25000 65535 $ echo 'net.ipv4.ip_local_port_range = 25000 65535' | sudo tee -a /etc/sysctl.conf $ sudo sysctl -p
Restarted Docker,
$ sudo systemctl restart docker
Restarted container,
$ docker restart container-with-problem
And the problem went away. The container started and remained healthy.