An Ansible patching Playbook
A couple of years ago I completed the Red Hat Certified Specialist in Ansible Automation course and then took on full onus of patching 600+ Linux servers, where patching was still being done mostly manually.
The solution was of course Ansible, and initially I created a large Playbook on the CLI of our administration server, which worked very well but needed a proper CI/CD solution.
Fast forward a few months and I worked alongside colleagues within Platform Engineering to implement my Playbook in to Git and Visual Studio Code, and then further on I was involved with getting the whole thing covered by Ansible Automation Platform (AAP) and Red Hat Satellite, and triggerable via Azure Devops.
One would think that a Playbook for patching would be fairly simplistic (dnf update, reboot?), however there are a lot more stages to patching to be considered;
- Confirming each server has correct configuration for patching
- Checking each server against a ‘blacklist’
- Placing servers in Maintenance Mode within monitoring software
- Checking each server for updates and not going ahead with further tasks should no updates be available (conforming to the Integrity and Availability of the CIA Triad)
- Sending wall messages
- Detecting applications that may have issues if the server just reboots, and shutting them down safely
- Taking a server snapshot/backup
- Making sure the servers come back up after reboot
- Logging everything
With all of the above in mind I wrote the following Ansible Playbook, which was then taken up by our business as the patching solution for Linux, moved over to Git, and has worked very well ever since.
It scales nicely, it’s easy to maintain and it integrates well in to CI/CD solutions.
This Playbook is available on my GitHub.
patch.yml
This is the file that triggers the patching, located in the ‘root’ directory of your Playbook (where the ‘roles’ directory is also located).
It calls each of the roles within the roles directory, in the order shown below.
Usually the call to each role is a bit simpler than the below syntax, but we’re wrapping the whole thing in a block, with an always, in order to have a summary correctly displayed at the end, so use the same syntax as below if you want that to work.
To run it from the CLI, use ansible-playbook patch.yml -e "deployment=to_be_patched"
This particular Playbook works from /ansible/patching/, but you can amend any sections of it that specifically refer to that location, or use relative paths if appropriate.
The -e (–extra-vars) option is used because my inventory file has a group called ‘to_be_patched’ in it. This allows me to throw in new groups on-the-fly, should I ever need to mop-up a few servers at the end, for example (it saves waiting for the Ansible to do a yum check-update on all servers, every time).
---
- name: "PLAY - Preflight / localhost stuff"
hosts: localhost
connection: local
gather_facts: true
roles:
- trigger
- check_deployment_var_defined
- get_list_of_blacklisted_hosts
- name: "PLAY - Patching"
hosts: "{{ deployment }}"
gather_facts: true
serial: 30
tasks:
- block:
- include_role:
name: blacklist
- include_role:
name: check_everything_configured
- include_role:
name: yum_clean_all
- include_role:
# This role sets variable "yumoutput.results" to non-zero if patches
# are available for the server.
name: check_server_needs_update
- include_role:
name: wall
when: yumoutput.results|length > 0
- include_role:
name: place_in_maintenance_mode
when: yumoutput.results|length > 0
- include_role:
name: stop_applications
when: yumoutput.results|length > 0
- include_role:
name: take_snapshot
when: yumoutput.results|length > 0
- include_role:
name: patch
when: yumoutput.results|length > 0
- include_role:
name: reboot
when: yumoutput.results|length > 0
- include_role:
name: yum_clean_all
- include_role:
name: post_check
when: yumoutput.results|length > 0
- include_role:
name: remove_snapshot
when: yumoutput.results|length > 0
always:
- import_role:
name: summary
delegate_to: localhost
- name: "PLAY - Final Word"
hosts: localhost
connection: local
gather_facts: false
roles:
- summary
roles/trigger/tasks/main.yml
The whole patching Playbook, and each role’s tasks, are wrapped in a block that allows us to send information about the state of the patching to a log file, output.log.
We can then use output.log to see when the patching was run, and for each server how everything went. This first role is simply to start that log off, entering the current time and date to a new line.
---
- name: "Adding trigger message to output.log"
shell:
cmd: "echo '################ Playbook triggered at {{ ansible_date_time.date }} {{ ansible_date_time.time }} ##############' >> /ansible/patching/output.log
delegate_to: localhost
roles/check_deployment_var_defined/tasks/main.yml
This is just making sure that, as mentioned above, the extra variable ‘deployment’ is defined, before we continue.
---
- fail:
msg:
- "You need to define the Deployment type to patch in order to run this Playbook. Please use; ansible-playbook patch.yml -e 'deployment=[FOO]', where FOO is either to_be_patched (for manual patching purposes), or a group defined within the inventory file."
when: deployment is not defined
tags:
- config
roles/get_list_of_blacklisted_hosts/tasks/main.yml
This gathers a list of hosts from the file named ‘blacklist’ and saves it for later reference.
---
- name: "Gathering a list of all the hosts in the blacklist file"
shell: grep -v '^#' /ansible/patching/blacklist
register: blacklist_output
- name: ""[[[ INFO ]]] - Here is the content of the blacklist file"
debug:
msg: "{{ blacklist_output.stdout_lines }}"
- name: "Set list of hosts found in the blacklist file as a fact for later use"
set_fact: blacklist_hosts="{{ blacklist_output.stdout }}"
roles/blacklist/tasks/main.yml
This uses the output fact from the role above to make sure none of the hostnames in the ‘inventory’ file are mentioned in the file named ‘blacklist’.
If they are, the Playbook will ignore them, with a failure message on-screen and sent to output.log.
---
- block:
- name: "Check that each server is not blacklisted from being patched via Ansible"
assert:
that:
- "'{{ inventory_hostname }}' not in '{{ hostvars['localhost']['blacklist_hosts'] }}'"
success_msg: "{{ inventory_hostname }} is not blacklisted and can therefore be patched via Ansible."
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Server is blacklisted' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Server {{ inventory_hostname }} is specifically blacklisted, via the 'blacklist' file, from being patched using this Playbook."
roles/check_everything_configured/tasks/main.yml
You could add anything you want to here, whatever is appropriate for your environment.
In my case, I wanted to;
- Make sure any servers being patched were RHEL (anything not RHEL wasn’t under our remit to patch)
- Roll some bespoke monitoring files out, making sure the monitoring on all servers was kept aligned alongside patching
- Determine if the server was physical, and not virtual, because we don’t want to then try and take a snapshot of it
- Clean up old rescue kernels that were no longer needed (housekeeping)
---
- block:
- name: "Checking server is Red Hat"
assert:
that:
- ansible_distribution == 'RedHat'
fail_msg: "This server is not RHEL."
success_msg: "Server is RHEL."
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Server is not Red Hat' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Server {{ inventory_hostname }} is not Red Hat."
- block:
- name: "Rolling out all Check_MK bespoke scripts and plugins within /usr/lib/check_mk_agent"
synchronize:
archive: true
checksum: true
recursive: true
delete: true
src: /usr/lib/check_mk_agent
dest: /usr/lib/
mode: push
use_ssh_args: true
tags:
- config
- cmk
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Unable to roll out Check_MK bespoke scripts and plugins to /usr/lib/check_mk_agent' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Unable to roll out Check_MK bespoke scripts and plugins to /usr/lib/check_mk_agent, on {{ inventory_hostname }}. Is the Check_MK agent installed?"
- block:
- name: "Determining whether the server is Virtual or Physical"
shell: dmidecode -s system-manufacturer | grep -i VMware
register: server_phys_virt
changed_when: server_phys_virt.rc == 1
failed_when: server_phys_virt.rc not in [0,1]
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Unable to determine if the server is physical or virtual, using dmidecode.' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Unable to determine if the server is physical or virtual, using dmidecode, on {{ inventory_hostname }}. This is unexpected and you may have to patch this server manually and take a look at the Ansible script."
- name: "Setting whether the server is Virtual or Physical as a fact, for later use"
set_fact: server_type="{{ server_phys_virt }}"
- name: "Cleaning up Rescue Kernels"
shell: |
for k in $(ls -1 /boot/vmlinuz-0-rescue-* 2>/dev/null | grep -v "$(cat /etc/machine-id)")
do
rm -f "$k" "${k/vmlinuz-/initramfs-}.img"
grubby --remove-kernel="$k"
done
roles/yum_clean_all/tasks/main.yml
Before checking if there are any updates for each server, it’s good practice to run a yum/dnf clean all.
---
- block:
- name: "Running yum clean all on each server"
command: yum clean all
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Failed to run yum clean all' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Failed to run yum clean all, on {{ inventory_hostname }}."
roles/check_server_needs_update/tasks/main.yml
Now we check whether each server has any updates available.
If not, the server is skipped and a message goes to output.log (and our custom summary at the end of the Playbook), keeping in line with Integrity of data and Availability of the server (CIA Triad).
---
- name: "Checking each server for outstanding yum updates"
yum:
list: updates
update_cache: true
register: yumoutput
changed_when: yumoutput.results|length > 0
- name: "Adding servers that are already up-to-date in to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ OK ] No outstanding updates' >> /ansible/patching/output.log
delegate_to: localhost
when: yumoutput.results|length == 0
roles/wall/tasks/main.yml
We have a few minutes between this role and the patching itself, so at this point we send a Wall message to the server to tell anyone logged on that it will be rebooted soon.
This gives people opportunity to save work (or scream) before the patching goes ahead.
99.9% of the time, of course, there’s no one logged on, but better to be safe.
---
- name: "Sending wall message"
command: wall This server is about to be patched via Ansible and will likely reboot in the next few minutes. Please save any work immediately.
roles/place_in_maintenance_mode/tasks/main.yml
This is a specialised role for the environment we had.
In this case, the servers were being monitored by System Center Operations Manager (SCOM) and would alert administrators out-of-hours if the server was rebooted. We don’t want that, so a SCORCH runbook was created to trigger Maintenance Mode on a server when a URL was called.
---
- block:
- name: "Placing each server in Maintenance Mode in SCOM"
uri:
url: "http://scom_mm_hostname/MM/Home/InstantMM/?ComputerName={{ inventory_hostname }}.company-name.co.uk&Min=240&MMAction=Start"
return_content: false
timeout: 120
use_proxy: false
delegate_to: localhost
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Couldn't place server in Maintenance Mode in SCOM' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Server {{ inventory_hostname }} failed to go in to Maintenance Mode in SCOM."
roles/stop_applications/tasks/main.yml
This role will be very specific to your own applications and needs.
Sometimes there are applications or databases running on a server that we don’t want to be affected by package updates or a reboot, so it’s better to ‘nicely’ shut them down before we patch.
The below example is for Splunk, Tomcat and Oracle DBs.
We can also use the data from this role later on, to determine that these applications/services come back up successfully after reboot (check what’s running now, check it’s still running later).
---
- block:
- name: "Detecting Splunk"
shell: systemctl list-units | grep -wc splunk.service
register: splunk_service
failed_when: splunk_service.rc == 257
- debug:
msg: "No Splunk instances were found on {{ inventory_hostname }}"
when: splunk_service.stdout == '0'
- debug:
msg: "!! WARNING !!! Splunk is running on {{ inventory_hostname }}"
when: splunk_service.stdout > '0'
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Unable to detect whether Splunk is running on the server.' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Unable to determine if Splunk is running on {{ inventory_hostname }}. This is unexpected and may be an issue with the Ansible patching script."
- block:
- name: "Stopping Splunk"
service:
name: splunk
state: stopped
when: splunk_service.stdout != '0'
- name: "Making sure Splunk shut down successfully"
shell: ps -e -o pid,cmd | grep "splunkd.*start" | grep -v -E "grep|process-runner" | awk '{ print $1 }' | head -n 1
register: splunk_success
failed_when: splunk_success.stdout != ""
when: splunk_service.stdout != '0'
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Splunk did not shut down' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Splunk failed to shut down on {{ inventory_hostname }}."
##############################################################
- block:
- name: "Detecting Tomcat"
shell: systemctl list-units | grep -wc tomcat.service
register: tomcat_service
failed_when: tomcat_service.rc == 257
- debug:
msg: "No Tomcat instances were found on {{ inventory_hostname }}"
when: tomcat_service.stdout == '0'
- debug:
msg: "!! WARNING !!! Tomcat is running on {{ inventory_hostname }}"
when: tomcat_service.stdout > '0'
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Unable to detect whether Tomcat is running on the server.' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Unable to determine if Tomcat is running on {{ inventory_hostname }}. This is unexpected and may be an issue with the Ansible patching script."
- block:
- name: "Stopping Tomcat"
service:
name: tomcat
state: stopped
when: tomcat_service.stdout != '0'
- name: "Making sure Tomcat shut down successfully"
shell: ps -e -o cmd | grep catalina.startup | grep -v grep | head -n 1
register: tomcat_success
failed_when: tomcat_success.stdout != ""
when: tomcat_service.stdout != '0'
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Tomcat did not shut down' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Tomcat failed to shut down on {{ inventory_hostname }}."
##############################################################
- block:
- name: "Detecting Oracle Databases"
shell: ps -e -o cmd | grep ora_pmon_ | grep -v grep | cut -d"_" -f3 | sort
register: oracledb_service
- debug:
msg: "No Oracle databases were found on {{ inventory_hostname }}"
when: oracledb_service.stdout == ''
- debug:
msg: "!! WARNING !!! Oracle Databases are running on {{ inventory_hostname }}"
when: oracledb_service.stdout != ''
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Unable to detect Oracle DBs' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Unable to determine if there are any Oracle Databases running on {{ inventory_hostname }}. This is unexpected and may be an issue with the Ansible patching script."
- block:
- name: "Stopping Oracle Databases using script"
shell:
cmd: "/path/to/script/stop_dbs.sh"
become: yes
become_user: oracle
when: oracledb_service.stdout != ''
- name: "Making sure Oracle Databases shut down successfully"
shell: ps -e -o cmd | grep ora_pmon_ | grep -v grep | cut -d"_" -f3 | sort
register: oracle_success
failed_when: oracle_success.stdout != ""
when: oracledb_service.stdout != ''
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Oracle Databases failed to shut down' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Not all Oracle Databases successfully shut down on {{ inventory_hostname }}."
roles/take_snapshot/tasks/main.yml
Note that this role is calling a role located in /ansible/snapshots.
I have included that role below this one, for reference.
I chose to keep the snapshot role separate/stand-alone because it’s very useful to call from other Playbooks as well.
---
- block:
- include_role:
name: /ansible/snapshots/roles/take_snapshot
vars:
host_to_snap: "{{ inventory_hostname }}"
snap_reason: "Patching"
snap_source: "input_source_here"
when: server_type.rc == 0
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Unable to take vSphere snapshot' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Unable to take a vSphere Snapshot of {{ inventory_hostname }}. This could be because it has disks that can't be snapshot, although the playbook should be accounting for those, or the Unix vSphere user SVC_Linux_Patching couldn't find the hostname. You may need to take a manual snapshot to be on the safe side."
- debug:
msg: "Server {{ inventory_hostname }} is Physical, therefore a vSphere snapshot is irrelevant. Snapshot step will be skipped."
when: server_type.rc != 0
/ansible/snapshots/roles/take_snapshots/main.yml
See the comment on the previous role above.
This role connects to vSphere and takes a snapshot of the host, using variables defined in the last role in the description of the snapshot.
---
- name: "Creating vSphere snapshot - Onprem servers"
delegate_to: localhost
shell:
cmd: "echo 'Connect-VIServer host1234.company-name.co.uk -User USERNAME -Password blablabla; New-Snapshot -VM '{{ host_to_snap }}' -Name \"Ansible Automated Snapshot\" -Confirm:$false -Description \"Snapshot taken automatically via Ansible. The reason was given as: {{ snap_reason }}. The source was: {{ snap_source }}.\" -Memory:$false; Disconnect-VIServer -Confirm:$false' | pwsh | grep -i 'created'"
when: "'onprem' in inventory_hostname"
register: snapshot_output
- name: "Creating vSphere snapshot - DMZ servers"
delegate_to: localhost
shell:
cmd: "echo 'Connect-VIServer host5678.company-name.co.uk -User USERNAME -Password blablabla; New-Snapshot -VM '{{ host_to_snap }}' -Name \"Ansible Automated Snapshot\" -Confirm:$false -Description \"Snapshot taken automatically via Ansible. The reason was given as: {{ snap_reason }}. The source was: {{ snap_source }}.\" -Memory:$false; Disconnect-VIServer -Confirm:$false' | pwsh | grep -i 'created'"
when: "'dmz' in inventory_hostname"
register: snapshot_output
roles/patch/tasks/main.yml
Finally, we patch!
Async and poll are used here, so that the connection doesn’t just stay open and ‘hang’. This frees up network resources, as well as gives us a decent indication on-screen as to whether the patching is still taking place.
It also allows the role to time out, should patching fail to complete within a reasonable amount of time (without async and poll, we could end up with our Playbook just hanging indefinitely).
---
- block:
# Trigger a yum update Ansible 'Job' on each server, with a fail timeout of 12 minutes
- name: "Triggering yum update on each server"
yum:
name: "*"
state: latest
disable_gpg_check: true
async: 1920
poll: 0
register: yum_sleeper
- debug:
msg: "Now patching {{ inventory_hostname }}. This may take a few minutes, please wait ... "
# Await the result of the yum update, polling every 30 seconds for 10 minutes
- name: "Awaiting yum update result"
async_status:
jid: "{{ yum_sleeper.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 60
delay: 30
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] The yum update failed' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Yum update failed on {{ inventory_hostname }}. Is the repository configured correctly, and can the server communicate with it?"
roles/reboot/tasks/main.yml
Whilst we could add some additional tasks here to determine if a server does indeed need a reboot after patching (needs-rebooting
), it was decided (and I think makes good sense) to reboot the servers every time.
This makes sure that they are all stable enough to come up correctly, along with their services, after patching, and during the allotted maintenance window for patching.
---
- block:
- name: "Rebooting servers"
debug:
msg: "!!! WARNING !!! - Now rebooting {{ inventory_hostname }} and then testing it came back up. This may take a few minutes, please wait ... "
- name: "Reboot output"
reboot:
connect_timeout: 10
pre_reboot_delay: 0
post_reboot_delay: 10
reboot_timeout: 480
test_command: "uname -a"
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Server failed to reboot' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Server {{ inventory_hostname }} either failed to reboot, or did not reboot in a timely manner. Please check it manually to make sure it has rebooted and come back up correctly."
roles/post_check/tasks/main.yml
This is another role that is very specific to individual company requirements.
- We use the facts gathered during the stop_applications role above to determine what services were running on each server before reboot, and make sure they have come back up again after reboot.
- We send a list of updated packages to /tmp on each server, for reference if needed.
- Each server is now removed from Maintenance Mode in the monitoring software.
---
- name: "Let's wait 1 minute for everything to finish coming back up, before checking patching was successful"
pause:
minutes: 1
- block:
- name: "Making sure any relevant Oracle Databases came back up successfully"
shell: ps -e -o cmd | grep ora_pmon_ | grep -v grep | cut -d"_" -f3 | sort
register: oracle_back_up
failed_when: oracle_back_up.stdout == ""
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Oracle DBs failed to come back up' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "There were Oracle DBs detected on {{ inventory_hostname }}, but they don't appear to have come back up. Please contact an Oracle DBA to make sure the DBs are restarted on {{ inventory_hostname }}."
when:
- oracledb_service.stdout != ''
- block:
- name: "Making sure Splunk came back up successfully"
shell: ps -e -o pid,cmd | grep "splunkd.*start" | grep -v -E "grep|process-runner" | awk '{ print $1 }' | head -n 1
register: splunk_back_up
failed_when: splunk_back_up.stdout == ""
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Splunk failed to come back up' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Splunk was detected on {{ inventory_hostname }}, but it doesn't appear to have come back up. Please check why and start it manually."
when: splunk_service.stdout != '0'
- block:
- name: "Making sure Tomcat came back up successfully"
shell: ps -e -o cmd | grep catalina.startup | grep -v grep | head -n 1
register: tomcat_back_up
failed_when: tomcat_back_up.stdout == ""
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Tomcat failed to come back up' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Tomcat was detected on {{ inventory_hostname }}, but it doesn't appear to have come back up. Please check why and start it manually."
when: tomcat_service.stdout != '0'
- name: "Gathering a list of updated packages"
shell: 'rpm -qa --last | grep "$(date +%a\ %d\ %b\ %Y)" | cut -f 1 -d " "'
register: yumlistresult
changed_when: false
- name: "Adding list of packages, updated today, to a temporary log file in /tmp"
shell:
cmd: "echo '{{ ansible_date_time.date }} - {{ inventory_hostname }} - {{ yumlistresult.stdout }}' > /tmp/{{ inventory_hostname }}_packages_updated.log"
delegate_to: localhost
- block:
- name: "Taking each server out of Maintenance Mode in SCOM"
uri:
url: "http://hostname/MM/Home/InstantMM/?ComputerName={{ inventory_hostname }}.company-name.co.uk&MMAction=Stop"
return_content: false
timeout: 120
delegate_to: localhost
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Failed to remove server from SCOM Maintenance Mode' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Failed to remove {{ inventory_hostname }} from SCOM Maintenance Mode."
- name: "Adding successfully patched server to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ OK ] Patched' >> /ansible/patching/output.log
delegate_to: localhost
roles/remove_snapshot/tasks/main.yml
This is another call to vSphere, to delete the snapshots taken before patching.
Of course, you don’t need to add this role, if you want to keep the snapshots for a while.
---
- block:
- name: "Removing snapshot for {{ inventory_hostname }} - onprem servers"
delegate_to: localhost
shell:
cmd: "echo 'Connect-VIServer host1234.company-name.co.uk -User USERNAME -Password blablabla; Get-VM '{{ inventory_hostname }}' | Get-Snapshot | Where-Object Name -like \"Ansible Automated Snapshot\" | Remove-Snapshot -Confirm:$false; Disconnect-VIServer -Confirm:$false' | pwsh"
when: "'onprem' in inventory_hostname"
- name: "Removing snapshot for {{ inventory_hostname }} - dmz servers"
delegate_to: localhost
shell:
cmd: "echo 'Connect-VIServer host5678.company-name.co.uk -User USERNAME -Password blablabla; Get-VM '{{ inventory_hostname }}' | Get-Snapshot | Where-Object Name -like \"Ansible Automated Snapshot\" | Remove-Snapshot -Confirm:$false; Disconnect-VIServer -Confirm:$false' | pwsh"
when: "'dmz' in inventory_hostname"
rescue:
- name: "Adding reason for failure to output.log"
shell:
cmd: echo '{{ ansible_date_time.date }} {{ ansible_date_time.time }} - {{ inventory_hostname }} - [ FAIL ] Unable to remove vSphere snapshot' >> /ansible/patching/output.log
delegate_to: localhost
- fail:
msg: "Unable to remove the vSphere Snapshot of {{ inventory_hostname }}. Please be sure to delete this snapshot manually!"
roles/summary/tasks/main.yml
This gives a really a nice output (using our log file and the blocks wrapped around everything so far) at the end of the patching Play.
We can easily see not only which servers failed to patch, but a summary of why, so we don’t have to scroll back through thousands of lines of Ansible output to work it out.
---
- name: "Getting contents of output.log"
shell:
cmd: tac /ansible/patching/output.log | grep '#######################' -m 1 -B 9999 | tac
register: log_contents
- name: "Summary from the latest Playbook run"
debug:
msg: "{{ log_contents.stdout_lines | join('\n') }}"
Hopefully you find this Playbook useful.
There’s obviously a lot of code here, so if you spot anything I’ve carried over from my working environment incorrectly either on this page or on my GitHub, please do let me know via jackcollins1434@yahoo.com.