How to investigate "Failed Units" in CoreOS
When I created a Kubernetes cluster with CoreOS, one of the CoreOS nodes claimed "Failed Units" when I logged in to it:
$ ssh -i ~/.ssh/key.pem core@xxx.xxx.xxx.xxx
Last login: Mon May 23 04:43:57 2016 from yyy.yyy.yyy.yyy
CoreOS beta (1010.3.0)
Update Strategy: No Reboots
Failed Units: 5
var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-ap\x2dnortheast\x2d1c-vol\x2d8c3b1734.mount
var-lib-rkt-pods-run-bf6f1c19\x2d7bc0\x2d4931\x2d885a\x2d811cc236973a-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-ap\x2dnortheast\x2d1c-vol\x2d8c3b1734.mount
docker-0bd19e353b40194e4bcc35172fa5b954ef2ba366121de2163f07f557bbcd170a.scope
locksmithd.service
polkit.service
You can get more info with systemctl --failed
:
$ systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-ap\x2dnortheast\x2d1c-vol\x2d8c3b1734.mount loaded failed failed /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/ap-northeast-1c/vol-8c3b1734
● var-lib-rkt-pods-run-bf6f1c19\x2d7bc0\x2d4931\x2d885a\x2d811cc236973a-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-ap\x2dnortheast\x2d1c-vol\x2d8c3b1734.mount loaded failed failed
● docker-0bd19e353b40194e4bcc35172fa5b954ef2ba366121de2163f07f557bbcd170a.scope loaded failed failed docker container 0bd19e353b40194e4bcc35172fa5b954ef2ba366121de2163f07f557bbcd170a
● locksmithd.service masked failed failed locksmithd.service
● polkit.service loaded failed failed Authorization Manager
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
5 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
You can get the status of the specific unit by systemctl status ...
:
$ systemctl status locksmithd.service
● locksmithd.service
Loaded: masked (/dev/null)
Active: failed (Result: resources) since Tue 2016-05-17 06:34:35 UTC; 6 days ago
Main PID: 758 (code=exited, status=1/FAILURE)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
You can list all the units by systemctl list-units
:
$ systemctl list-units
If you think that the failure is just a temporary glitch, then run this:
$ sudo systemctl reset-failed
Then check that everything is ok:
$ systemctl --failed
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
There's more you can do with systemctl
. Check systemctl --help
for detail. Enjoy it!
Written by aeas44
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Coreos
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#