I'm lucky enough to be diving into answering some infrastructure questions that support our servers. We've selected a highly available configuration/service discovery server called 'etcd'. It's pretty easy to setup, however some of the documentation is a bit lacking.
So here's two quick tips:
There is no backup
At least, nothing documented. etcd uses it's own data storage engine & format which consists of two things: a log file (containing all updates) and a series of snapshots (point-in-time saves of the state of etcd).
So, your top two backup options are:
Recursively querying etcd via it's API to retrieve the key/value pairs, serialising that and then persisting it somewhere.
Alternatively, find some (reliable) way to archive etcd's datadir. Prime candidates are rsync or using something like LVM.
Restoration could be tricky - don't spin up one with the data and assume the other empty ones will just sync across. As part of the consensus algorithm, it's likely that one of the empty instances will assume the 'leadership role', at which point it knows nothing...
There are metrics
You just have to dig into the debugging documentation to find it.
Nice and simple this one, you've got two options for this -
you can use the HTTP-based metric endpoint provided by etcd.
You can send data to Graphite (assuming you've a graphite box running).
If you go the Graphite way, be aware that you'll receive metrics with the namespace etcd.machine name.