Elixir - epmd02 Sep 2017
- debugging using epmd
epmd is like a DNS server for Erlang nodes.
epmd (Erlang Port Mapper Daemon), is a part of the Erlang runtime system, written in C, that acts as a name server for distributed Erlang. When an Erlang node starts in distributed mode (by setting the -name parameter on startup) it checks to see if there is already an EPMD instance running bound to the loopback address (and by default listening on port 4369), and if not starts one. It then chooses a random port for inter-node communication, and registers its name and corresponding port with EPMD.
Normally each server will have its own EPMD instance, but each EPMD instance can have multiple Erlang nodes on that server registered to it.
epmd is responsible for mapping the symbolic node names to machine addresses.
NOTE: Elixir application (Erlang node) can respond to application commands (ping/start/stop) iff its name is registered in epmd.
epmd can be started:
detached (as a daemon)
normally epmd is to be started automatically as a daemon when distributed Erlang node is started and no running instance of epmd is present (otherwise it’s used).
it’s possible to instruct distributed Erlang node not to start epmd on startup by passing
-start_epmd falseEVM flag - in that case node will fail to start if epmd is not running.
but in my experience it had no effect - application still started epmd unless it was already running.
in the foreground
epmd is usually started in the foreground either in systemd service unit or manually in the shell for debugging purposes.
debugging using epmd
kill running epmd process and start in the foreground for debugging
$ sudo killall epmd $ epmd -d
list names registered with currently running epmd
$ epmd -names epmd: up and running on port 4369 with data: name billing_stage at port 30701 name billing_prod at port 30183
node is unregistered right after it’s registered
this might happen when application crashes - examine its erl_crash.dump file or try to start application in the foreground to find out the error.
successfully running node is not registered
situation when application is successfully running but not registered in epmd can be caused by this sequence of events:
- application that is started via systemd service first
billing_stageservice doesn’t affect
billing_prodservice again starts new
epmdprocess (because there are no running instances of
as a result new
epmd process knows nothing about
application (even though it’s still running) and the latter stops
responding to all application commands.
IDK why but stopping
billing_prod service again doesn’t kill
epmd process any longer.
epmd process shouldn’t actually exit when application
that started it automatically is stopped. but for some reason this
is the case when application is started via systemd service.
so it seems safer to make running epmd independent of running applications - this can be done by creating a dedicated systemd service for epmd and configuring it as a requirement dependency for all application services:
quick and dirty fix to register not responding application in epmd:
the only way to make epmd aware of running but not responding
application is to restart its service - I guess systemd sends SIGTERM
signal to running process when it fails to stop it using
command. when application is started it will discover already running
epmd instance and register itself as usual.