Salt Stack, a (serious) alternative to Puppet

I couldn’t write it better : see http://www.lecloud.net/post/29325359938/salt-to-the-rescue

So basically, Salt is a configuration management system (à la Puppet) and allows remote execution (à la Rundeck).

First thing first, it is very easy to install. I know Puppet now offers repositories and it’s probably as easy, but Salt is just a package with a couple of dependencies. Actually to achieve the same tasks you have to have Puppet and Mcollective, which are still two distinct products. Salt does the job from one package.

Then, it’s based on Python, YAML and Jinja.

The documentation is very good, and the community very active (got answers within 30 seconds in #salt on Freenode).

The last thing I like : minions keep a constant connection to the master. You can push  changes to minions immediately. I attended the Puppet Fundamentals training late last year and asked about a “push” of changes instead of a “pull”. It seems like there’s a solution but the trainer couldn’t get it working.

One thing they could improve is the frontpage of their site. When you go to http://www.saltstack.org you are redirected to http://saltstack.com/community.html instead of http://saltstack.com/about.html which explains what the product does.

Installation (RHEL) :

Server :

yum –enablerepo=epel install salt-master

Edit /etc/salt/master

file_roots:
  base:
    - /srv/salt
  dev:
    - /srv/salt/dev
  prd:
  - /srv/salt/prd

pillar_roots:
  base:
  - /srv/pillar

service salt-master restart

Client :

yum --enablerepo install salt-minion

Edit /etc/salt/minion

master: your.master.server.example.org

service salt-minion restart

Now you should see a pending key with “salt-key”. See “salt-key -h” for more info.

Basically, modules are called “states”.

Pillars are kind of variables you can use in your files.

This is the content of /srv on my master :

.
./pillar
./salt
./salt/prd
./salt/dev
./salt/sandbox
./salt/sandbox/motd
./salt/sandbox/ntpd
./salt/sandbox/apache
./salt/sandbox/sshd
./salt/sandbox/snmpd
./salt/acc
./salt/common
./salt/common/groups
./salt/common/users
./salt/common/packages
./salt/common/files
./salt/common/sudo

I have 5 environments :
- sandbox : where I develop states
- dev : development servers
- acc : staging servers
- prd : production servers
- common : states common to all environments (sshd, snmpd, etc.)

If you look in /etc/salt/master, you’ll see there’s a “base” environment. This is where your top.sls (the key component of your salt architecture) will reside :

# cat /srv/salt/top.sls
common:
  '*':
    - packages
    - users
    - groups
    - files
    - sudo

dev:
  '*.dev.example.org':
    - dev

acc:
  '*.acc.example.org':
    - acc

prd:
  '*.prd.example.org':
    - prd

sandbox:
  'salt-client*':
    - motd
    - apache
    - ntpd
    - snmpd
    - sshd

You can see I started working with Salt only a couple of days ago. My states are still in the “sandbox” environment.

How you can push states to minions :

salt ‘*’ state.highstate

/srv/pillar/top.sls

base:
'*':
- convention-os

/srv/pillar/convention-os.sls

convention-os:
  pkg:
    {% if grains['os_family'] == 'RedHat' %}
      apache: httpd
      snmpd: net-snmp
      vim: vim-enhanced
    {% elif grains['os_family'] == 'Debian' %}
      apache: apache2
      snmpd: snmpd
      vim: vim
    {% endif %}
  service:
    {% if grains['os_family'] == 'RedHat' %}
      apache: httpd
      ntpd: ntpd
      sshd: sshd
    {% elif grains['os_family'] == 'Debian' %}
      apache: apache2
      ntpd: ntp
      sshd: ssh
    {% endif %}

States can be named this way /srv/salt/env/motd.sls or /srv/salt/env/motd/init.sls
I tend to prefer the later.

Here’s an example of state calling pillars :

apache:
  pkg:
    - installed
    - name: {{ pillar['convention-os']['pkg']['apache'] }}
  service:
    - running
    - name: {{ pillar['convention-os']['service']['apache'] }}

This is a pretty rough post, sorry about that. I just wanted to spread the word about Salt and hope you’ll consider joining in.

Documentation :
Online : http://docs.saltstack.com/
PDF : http://media.readthedocs.org/pdf/salt/latest/salt.pdf

 

See changes made to a filesystem with inotify

Install the package “inotify-tools” with your package manager (in EPEL for RHEL).

Then create and execute this script :

inotifywait -m -r --format $'%T %e %w%f' --timefmt '%H:%M:%S' --exclude ~/'(\.mozilla|Documents/KeepNote)' -e modify -e move -e create -e delete ~ 2>&1 | awk '/^[0-9]/ {
sub(/'"${HOME//\//\\/}"'/, "~", $0)
split($0, a, " ")
len=length(a[1])+length(a[2])+1
printf "%-20s %s\n", substr($0, 0, len), substr($0, len+2)
// flush stdout
system("")
next
}
{print ; system("")}
' | tee -a /tmp/home_monitor

Source : http://blog.yjl.im/2010/11/monitoring-file-system-changes-with.html

 

Rundeck howto and examples

Quoting rundeck.org : Rundeck is an Open Source process automation and command orchestration tool with a web console. As I understand it, it’s a fork of Control Tier : www.controltier.org I’m usually all for the command-line, but you have to admit the devs have done a pretty good job regarding the web console. The documentation is pretty good as well. No need to install agents on your servers. It works over SSH. You just need to deploy a dedicated public SSH key and you’re done (see ssh-copy-id). This post should help you install and configure Rundeck in under 15 minutes. It covers configuration of email, SSL, authentication against Active Directory and explains how you can store your node definitions from a URL. Installation on Red Hat : basically a single RPM with no deps. You just need a working java. OpenJDK is working fine. Email configuration (apparently not documented) : Edit /etc/rundeck/rundeck-config.properties

grails.mail.host=smtp.example.org
grails.mail.port=25
grails.mail.default.from=rundeck@example.org

Enabling SSL on the web console (self-signed) : See http://rundeck.org/docs/administration/ssl.html Basically :

cd /etc/rundeck/ssl keytool -keystore keystore -alias rundeck -genkey -keyalg RSA -keypass password -storepass password
cp /etc/rundeck/ssl/keystore /etc/rundeck/ssl/truststore

/etc/rundeck/framework.properties :

framework.server.url = https://localhost:4443
framework.rundeck.url = https://localhost:4443
framework.server.port = 4443

Under /etc/rundeck/profile uncomment :

export RDECK_JVM="$RDECK_JVM -Drundeck.ssl.config=/etc/rundeck/ssl/ssl.properties -Dserver.https.port=4443"

Enabling LDAP against Active Directory for authentication :/etc/rundeck/profile :

export RDECK_JVM="-Djava.security.auth.login.config=/etc/rundeck/jaas-ldap.conf \
    -Dloginmodule.name=ldap \
    -Drdeck.config=/etc/rundeck \
    -Drdeck.base=/etc/rundeck \
    -Drundeck.server.configDir=/etc/rundeck \
    -Dserver.datastore.path=/var/lib/rundeck/data \
    -Drundeck.server.serverDir=/var/lib/rundeck \
    -Drdeck.projects=/var/rundeck/projects \
    -Drdeck.runlogs=/var/lib/rundeck/logs \
    -Drundeck.config.name=/etc/rundeck/rundeck-config.properties \
    -Djava.io.tmpdir=$RUNDECK_TEMPDIR"

/etc/rundeck/jaas-ldap.conf :

ldap {
    com.dtolabs.rundeck.jetty.jaas.JettyCachingLdapLoginModule required
    debug="true"
    contextFactory="com.sun.jndi.ldap.LdapCtxFactory"
    providerUrl="ldap://intranet.example.org:389"
    bindDn="cn=queryldapaccount,ou=tech,ou=company,dc=intranet,dc=example,dc=org"
    bindPassword="xxx"
    authenticationMethod="simple"
    forceBindingLogin="true"
    userBaseDn="ou=company,dc=intranet,dc=example,dc=org"
    userRdnAttribute="sAMAccountName"
    userIdAttribute="sAMAccountName"
    userPasswordAttribute="unicodePwd"
    userObjectClass="user"
    roleBaseDn="OU=groups,OU=company,DC=intranet,DC=example,DC=org"
    roleNameAttribute="cn"
    roleMemberAttribute="member"
    roleObjectClass="group"
    cacheDurationMillis="300000"
    reportStatistics="true";
};

Configuring authorization : You have the YAML file /etc/rundeck/admin.aclpolicy The following gives full access to Rundeck for members of rundeck_superadmin group, and limits execution of jobs under the group PRD/system for members of rundeck_admin. For “groups”, see LDAP configuration, under roleBaseDn.

description: Super Admin, all access.
context:
  project: '.*' # all projects
for:
  resource:
    - allow: '*' # allow read/create all kinds
  adhoc:
    - allow: '*' # allow running/killing adhoc jobs
  job:
    - allow: '*' # allow read/write/delete/run/kill of all jobs
  node:
    - allow: '*' # allow read/run for all nodes
by:
  group: [rundeck_superadmin]

---

description: Super Admin, all access.
context:
  application: 'rundeck'
for:
  resource:
    - allow: '*' # allow create of projects
  project:
    - allow: '*' # allow view/admin of all projects
by:
  group: [rundeck_superadmin]

---

description: Admin can run jobs under the PRD/system group.
context:
  project: '.*' # all projects
for:
  resource:
    - equals:
        kind: job
      allow: [read]
    - equals:
        kind: node
      allow: [read,create,update,refresh] 
    - equals:
        kind: event
      allow: [read,create] 
  adhoc:
    - allow: [read] 
  job:
    - equals:
        group: 'DEV'
      allow: [read]
    - equals:
        group: 'STAGING'
      allow: [read]
    - equals:
        group: 'PRD/cron'
      allow: [read]
    - equals:
        group: 'PRD/system'
      allow: [read, run, kill]
  node:
    - allow: [read,run] 
by:
  group: [rundeck_admin]
---
description: Admin Application level access control, applies to creating/deleting projects, admin of user profiles, viewing projects and reading system information.
context:
  application: 'rundeck'
for:
  resource:
    - equals:
        kind: project
      allow: [read]
    - equals:
        kind: system
      allow: [read] # allow read of system info
    - equals:
        kind: user
      allow: [read] # allow modify user profiles
  project:
    - match:
        name: '.*'
      allow: [read]
by:
  group: [rundeck_admin]
You’ve deployed the SSH key and following those steps ? The web console is protected by HTTPS. You authenticate users against your Active Directory. You’re almost good to go.
Node inventory
You can either edit a XML file under your project folder : /var/rundeck/projects/EXAMPLE/etc/resources.xml
This is what the file should look like :
<?xml version="1.0" encoding="UTF-8"?>
<!-- 20121203 17:41:12 -->
<project>
<node name="node1.intranet.example.org" type="Node"
description="Node description"
hostname="node1.intranet.example.org"
username="root"
osFamily="RHEL"
osVersion="6"
osArch="64"
tags="EXAMPLE, OWNER, STAGING, WWW, ROOM_BXL, RACK10, PDU10_02"
file-copy-destination-dir="/var/tmp/"
/>
</project>
Regarding tags, imagination is your only limit. I personally specify the project manager name, room, role, if it’s either on or off and the environment. You can filter using a mix of fields (e.g. : please display RHEL5 64 bits staging server in room X, your query would look something like : tags:ROOM_X+STAGING and osVersion:5 and osArch:64).
So you can either save the XML file locally, or you can call it from a URL. That’s what I do by defining a URL Source under Resource Model Sources. I don’t have a CMDB yet, so I manually update a CSV, and wrote a bash script generating the XML and making it available in a SVN repository (don’t forget to set the MIME type application/xml to *.xml, see auto-props under your SVN configuration).
Now, you’re really good to go. You can start sending ad-hoc commands to your servers, or start looking into jobs. Rundeck jobs have replaced local crons on my servers. I don’t store script on servers anymore. They are all stored in a SVN repository (they were already before Rundeck) and are called directly from Rundeck. I had to modify some of them,  look into “job options” as you don’t want to store sensitive information in your scripts, as they are copied in the /tmp directory before being executed. You can see in my node definition that I specify this :
file-copy-destination-dir="/var/tmp/"
By default, Rundeck will use /tmp but some of my servers have /tmp mounted as a partition with the noexec flag. This would produce an error in Rundeck.

ActiveMQ 5.4.x install under RHEL 5.x

Tested with ActiveMQ 5.4.3, Red Hat Linux Enterprise 5.7 64 bits with Sun JVM 1.5

ActiveMQ 5.5.x requires JVM 1.6

The following is a simple copy and paste howto. Simply adapt the install variables and you’re good to go.

Let’s declare some variables for the install process :

AMQDIR="/usr/local"
VERSION="5.4.3"

Download and installation :

cd /root
wget http://apache.cu.be//activemq/apache-activemq/$VERSION/apache-activemq-$VERSION-bin.tar.gz
cp /root/apache-activemq-$VERSION-bin.tar.gz $AMQDIR
cd $AMQDIR
tar xvzf apache-activemq-$VERSION-bin.tar.gz
chown root. apache-activemq* -R
ln -f -s apache-activemq-$VERSION activemq

Configuration :

sed -i 's#ACTIVEMQ_HOME.*#ACTIVEMQ_HOME="$AMQDIR/activemq"#g' $AMQDIR/activemq/bin/linux-x86-64/activemq

sed -i 's#set.default.ACTIVEMQ_HOME=.*#set.default.ACTIVEMQ_HOME=$AMQDIR/activemq#g' $AMQDIR/activemq/bin/linux-x86-64/wrapper.conf

sed -i 's#set.default.ACTIVEMQ_BASE=.*#set.default.ACTIVEMQ_BASE=$AMQDIR/activemq#g' $AMQDIR/activemq/bin/linux-x86-64/wrapper.conf

Init script and making ActiveMQ start at boot :

ln -s $AMQDIR/activemq/bin/linux-x86-64/activemq /etc/init.d/activemq
chkconfig --add activemq
chkconfig activemq on
service activemq start

Logs :

tail -f /usr/local/activemq/data/wrapper.log

Accessing the admin section :

http://$SERVER:8161/admin/index.jsp

Two step authentication on SSH with Google Authenticator under Debian Sid

On a Debian Sid system, install the following :

apt-get install libpam-google-authenticator

Edit /etc/ssh/sshd_config and set :

ChallengeResponseAuthentication yes

Restart the service :

service ssh restart

Now run :

google-authenticator

Scan the barcode from the Google Authenticator app on your mobile device.

Edit /etc/pam.d/sshd and add at the very beginning of the file :

auth required pam_google_authenticator.so

Now test a SSH connection. You should be prompted by a cool “Verification code :”
Then by the regular password prompt.

Spin down external USB drive on Debian Squeeze

It seems like I have at least two options to spin down my external USB drive used for rsnapshot backups (Iomega 1TB). In the first place, I assumed it would spin down by itself by simply unmounting the volume, like on the Mac. But it doesn’t.

So I gave sdparm a try :

sdparm --command=stop /dev/backupdrive

It doesn’t work :-)

I found a working solution at http://forums.debian.net/viewtopic.php?f=7&t=60122

sg_start --readonly --stop /dev/backupdrive

sg_start is part of sg3-utils package.
YMMV, I guess ?

Use the cmd_postexec option in rsnapshot to trigger the spin down.

Want the same device name for your external USB drive ? Check out http://blog.wains.be/2010/04/10/udev-always-the-same-device-name-for-your-usb-drives/

Large files uploading fail with Apache + PHP + APC

We had one quite interesting problem at work.

We had a Drupal site where we couldn’t upload files larger than 32 MB, while having in php.ini :

upload_max_filesize = 200 MB
post_max_size = 200M

After disabling APC, we could upload larger files.

It turns out, it seems changing the following in apc.ini

apc.rfc1867_freq=0

to

apc.rfc1867_freq=100k

fixed the problem.

Doc : http://www.php.net/manual/en/apc.configuration.php#ini.apc.rfc1867-freq

apc.rfc1867_freq string

The frequency that updates should be made to the user cache entry for upload progress. This can take the form of a percentage of the total file size or a size in bytes optionally suffixed with "k", "m", or "g" for kilobytes, megabytes, or gigabytes respectively (case insensitive). A setting of 0 updates as often as possible, which may cause slower uploads.

I’m pretty sure this should not be related, as I have apc.rfc1867=0 in apc.ini.

If someone has a clue, drop me a line :-)

Authenticate Linux Red Hat with Microsoft Active Directory

Tested with Active Directory 2003 and RHEL 6.0

What we want to do :

- authentication against AD using Winbind and Kerberos
- allowing local and remote (SSH) authentication to members of a specific AD group (linuxadmin)
- allowing members of linuxadmin to use sudo
- UID/GID mapping against AD
- user homedir will be created at first log using pam_mkhomedir
- still possible to log in using local accounts, in case AD is unavailable

Check if resolution works :

# host -t srv _kerberos._tcp.intranet.example.org
_kerberos._tcp.intranet.example.org has SRV record 0 100 88 ad01.intranet.example.org.
_kerberos._tcp.intranet.example.org has SRV record 0 100 88 ad02.intranet.example.org.
_kerberos._tcp.intranet.example.org has SRV record 0 100 88 ad03.intranet.example.org.

Install necessary packages and enable Winbind at boot :

# yum install samba-common pam_krb5 sudo authconfig
# chkconfig winbind on

Create directory where homedirs will be stored :

# mkdir /home/EXAMPLE
# chmod 0777 /home/EXAMPLE

IMPORTANT : before proceeding, we need to make sure “hostname -f” returns a FQDN, THE SUBDOMAIN MUST MATCH THE AD DOMAIN.

# hostname -f
srv.intranet.example.org

Enable authentication :

# authconfig
--disablecache
--enablewinbind
--enablewinbindauth
--smbsecurity=ads
--smbworkgroup=EXAMPLE
--smbrealm=INTRANET.EXAMPLE.ORG
--enablewinbindusedefaultdomain
--winbindtemplatehomedir=/home/EXAMPLE/%U
--winbindtemplateshell=/bin/bash
--enablekrb5
--krb5realm=INTRANET.EXAMPLE.ORG
--enablekrb5kdcdns
--enablekrb5realmdns
--enablelocauthorize
--enablemkhomedir
--enablepamaccess
--updateall

Under RHEL 5.0, authconfig didn’t have the enablemkhomedir and enablepamaccess options. (you’ll get “authconfig: error: no such option: –enablemkhomedir”)

Winbind should restart by itself, if not :

# service winbind restart

authconfig will modify a couple of files : /etc/samba/smb.conf, /etc/pam.d/system-auth, /etc/nsswitch.conf, etc.

By default, UID/GID will be stored locally, and will differ from one system to another.

In order to always get the same UID/GID for our AD users/groups, we’ll map the ID’s against AD, by modifying /etc/samba/smb.conf :

From :

workgroup = EXAMPLE
realm = INTRANET.EXAMPLE.ORG
security = ads
idmap uid = 16777216-33554431
idmap gid = 16777216-33554431
template homedir = /home/EXAMPLE/%U
template shell = /bin/bash
winbind use default domain = true
winbind offline logon = false

To :

workgroup = EXAMPLE
realm = INTRANET.EXAMPLE.ORG
security = ads
idmap domains = EXAMPLE
idmap config EXAMPLE:backend = rid
idmap config EXAMPLE:base_rid = 500
idmap config EXAMPLE:range = 500-1000000
#idmap uid = 16777216-33554431
#idmap gid = 16777216-33554431

template homedir = /home/EXAMPLE/%U
template shell = /bin/bash
winbind use default domain = true
winbind offline logon = false

Now, in order to only allow members of linuxadmin group, edit :

For RHEL5.6 : /etc/pam.d/system-auth
For RHEL6.0 : /etc/pam.d/password-auth

I’ll also change the default homedir creation umask.

#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so user ingroup linuxadmin debug
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        sufficient    pam_krb5.so use_first_pass
auth        sufficient    pam_winbind.so use_first_pass
auth        required      pam_deny.so

account     required      pam_access.so
account     required      pam_unix.so broken_shadow
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 500 quiet
account     [default=bad success=ok user_unknown=ignore] pam_krb5.so
account     [default=bad success=ok user_unknown=ignore] pam_winbind.so
account     required      pam_permit.so

password    requisite     pam_cracklib.so try_first_pass retry=3 type=
password    sufficient    pam_unix.so md5 shadow nullok try_first_pass use_authtok
password    sufficient    pam_krb5.so use_authtok
password    sufficient    pam_winbind.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     optional      pam_mkhomedir.so umask=0077
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_krb5.so

Restart Winbind :

# service winbind restart

Now, join the machine to the domain, in this example user01 has domain admin permissions.

# net ads join -U user01
user01's password:
Using short domain name -- example
Joined 'SRV' to realm 'INTRANET.EXAMPLE.ORG'

When joining the domain, you could get error about DNS updates (maybe because the record already exists). This is not a problem.

Restart Winbind again :

# service winbind restart

Check if it works, by listing AD groups :

# wbinfo -g

Now, allow users in the linuxadmin group to use sudo :

# echo "%linuxadmin ALL=(ALL) ALL" >> /etc/sudoers

Test authentication using an AD account (in the linuxadmin group) and access to root account :

On the server check the logs :
tail -f /var/log/secure

On the client :
$ ssh user01@srv.intranet.example.org
user01@srv.intranet.example.org's password:
Creating directory '/home/EXAMPLE/user01'.
[user01@srv ~]$ sudo su -
[sudo] password for user01:
[root@srv ~]#

Test with another account, not being part of linuxadmin group, this time. User should be disconnected.

Logs should look something like this :

Apr 17 17:15:52 x sshd[27114]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.1  user=user-01
Apr 17 17:15:52 x sshd[27114]: pam_krb5[27114]: authentication succeeds for 'user-01' (user-01@INTRANET.EXAMPLE.ORG)
Apr 17 17:15:52 x sshd[27114]: pam_winbind(sshd:account): [pamh: 0x7f6910199390] ENTER: pam_sm_acct_mgmt (flags: 0x0000)
Apr 17 17:15:52 x sshd[27114]: pam_winbind(sshd:account): user 'user-01' granted access
Apr 17 17:15:52 x sshd[27114]: pam_winbind(sshd:account): [pamh: 0x7f6910199390] LEAVE: pam_sm_acct_mgmt returning 0 (PAM_SUCCESS)
Apr 17 17:15:52 x sshd[27114]: pam_succeed_if(sshd:account): requirement "user ingroup linuxadmin" was met by user "user-01"
Apr 17 17:15:52 x sshd[27114]: Accepted password for user-01 from 192.168.1.1 port 59369 ssh2
Apr 17 17:15:53 x sshd[27114]: pam_unix(sshd:session): session opened for user user-01 by (uid=0)

Useful commands :

# wbinfo -n user05
S-1-5-21-x-x-x-1129 User (1)

# getent passwd user05
user05:*:1129:519:John Doe:/home/example/user05:/bin/bash

# getent group linuxadmin
linuxadmin:*:7579:user01,user02,user03,user04

# wbinfo -u
# wbinfo -g

# wbinfo -D EXAMPLE
Name              : EXAMPLE
Alt_Name          : intranet.example.org
SID               : S-1-5-21-x-x-x
Active Directory  : Yes
Native            : Yes
Primary           : Yes
Sequence          : -1

Sources :
http://lanestechblog.blogspot.com/2010/11/ad-authentication-with-rhel-6.html
http://conigliaro.org/2008/12/19/active-directory-authentication-with-winbind-on-red-hat-linux/

Importing certificates on Android (CA and client)

Tested on my HTC Hero running Android 2.2.1

They do not make it terribly obvious, so I believe this is worth a post.

Android will not import CA cert in the PEM format, you’ll get a “no certificate to install” message at some point.

You actually have to export a P12 certificate containing the client certificate and the CA.

Use this command :
openssl pkcs12 -export -in clientcert.pem -inkey clientcert.key -certfile cacert.pem -name "VPN" -out clientcert.p12

Drop the resulting file (clientcert.p12) at the root of your sdcard.

Go under Settings > Location & Security > Install from SD card (under the section “Credential storage”).

After a few questions, you’re ready to go and you can proceed with the configuration of your Wi-Fi or VPN client (in my case WPA Enterprise Wi-Fi and OpenVPN).

Red Hat Cluster : VMware ESX fencing

Tested on Red Hat Enterprise Linux 5.6 64 bits and VMware ESX 3.5
Edit November 2011 : Tested on RHEL6.1 and VMware ESX 4.1

If you set up a cluster, in case of failure, you’ll probably want the surviving host to be able to “fence” or “stonith” the faulty node.

Red Hat Cluster provides a collection of scripts for that purpose (for APC, ILO, DRAC, etc. and VMware).

The vmware script doesn’t work out of the box :

# fence_vmware -a "esx.intranet.example.org" -l "fence_vmware_account" -p "xxx" -n 'node01'
fence_vmware_helper returned Please install VI Perl API package to use this tool!
Perl error: Can't locate VMware/VIRuntime.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at (eval 1) line 1.
BEGIN failed--compilation aborted at (eval 1) line 1.

Please use '-h' for usage

Go to http://www.vmware.com/support/developer/viperltoolkit/ (you’ll need to register)

Grab either one of those :

ESX 3.5

VMware-VIPerl-1.6.0-104313.i386.tar.gz
VMware-VIPerl-1.6.0-104313.x86_64.tar.gz

ESX 4.1

VMware-vSphere-Perl-SDK-4.1.0-*.i386.tar.gz
VMware-vSphere-Perl-SDK-4.1.0-*.x86_64.tar.gz

You’ll need to install some stuff on your system :

RHEL5

# yum install openssl-devel

Dependencies Resolved

========================================================================================================================================================================
 Package                                   Arch                         Version                                 Repository                                         Size
========================================================================================================================================================================
Installing:
 openssl-devel                             i386                         0.9.8e-12.el5_5.7                       rhel-5Server-x86_64-updates                       1.9 M
 openssl-devel                             x86_64                       0.9.8e-12.el5_5.7                       rhel-5Server-x86_64-updates                       1.9 M
Installing for dependencies:
 e2fsprogs-devel                           x86_64                       1.39-23.el5_5.1                         rhel-5Server-x86_64-updates                       633 k
 keyutils-libs-devel                       x86_64                       1.2-1.el5                               rhel-5Server-x86_64-updates                        27 k
 krb5-devel                                x86_64                       1.6.1-55.el5                            rhel-5Server-x86_64-updates                       1.9 M
 libselinux-devel                          x86_64                       1.33.4-5.7.el5                          rhel-5Server-x86_64-updates                       149 k
 libsepol-devel                            x86_64                       1.15.2-3.el5                            rhel-5Server-x86_64-updates                       192 k
 zlib-devel                                x86_64                       1.2.3-3                                 rhel-5Server-x86_64-updates                       102 k

Transaction Summary
========================================================================================================================================================================
Install       8 Package(s)
Upgrade       0 Package(s)

Total download size: 6.7 M
Is this ok [y/N]: 

RHEL6

# yum install openssl-devel perl-Compress-Raw-Zlib perl-Compress-Zlib
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package openssl-devel.x86_64 0:1.0.0-10.el6_1.5 will be installed
--> Processing Dependency: pkgconfig for package: openssl-devel-1.0.0-10.el6_1.5.x86_64
--> Processing Dependency: zlib-devel for package: openssl-devel-1.0.0-10.el6_1.5.x86_64
--> Processing Dependency: krb5-devel for package: openssl-devel-1.0.0-10.el6_1.5.x86_64
--> Processing Dependency: /usr/bin/pkg-config for package: openssl-devel-1.0.0-10.el6_1.5.x86_64
---> Package perl-Compress-Raw-Zlib.x86_64 0:2.023-119.el6_1.1 will be installed
---> Package perl-Compress-Zlib.x86_64 0:2.020-119.el6_1.1 will be installed
--> Processing Dependency: perl(IO::Uncompress::Gunzip) >= 2.020 for package: perl-Compress-Zlib-2.020-119.el6_1.1.x86_64
--> Processing Dependency: perl(IO::Compress::Gzip) >= 2.020 for package: perl-Compress-Zlib-2.020-119.el6_1.1.x86_64
--> Processing Dependency: perl(IO::Compress::Gzip::Constants) >= 2.020 for package: perl-Compress-Zlib-2.020-119.el6_1.1.x86_64
--> Processing Dependency: perl(IO::Compress::Base::Common) >= 2.020 for package: perl-Compress-Zlib-2.020-119.el6_1.1.x86_64
--> Running transaction check
---> Package krb5-devel.x86_64 0:1.9-9.el6_1.2 will be installed
--> Processing Dependency: libselinux-devel for package: krb5-devel-1.9-9.el6_1.2.x86_64
--> Processing Dependency: libcom_err-devel for package: krb5-devel-1.9-9.el6_1.2.x86_64
--> Processing Dependency: keyutils-libs-devel for package: krb5-devel-1.9-9.el6_1.2.x86_64
---> Package perl-IO-Compress-Base.x86_64 0:2.020-119.el6_1.1 will be installed
---> Package perl-IO-Compress-Zlib.x86_64 0:2.020-119.el6_1.1 will be installed
---> Package pkgconfig.x86_64 1:0.23-9.1.el6 will be installed
---> Package zlib-devel.x86_64 0:1.2.3-25.el6 will be installed
--> Running transaction check
---> Package keyutils-libs-devel.x86_64 0:1.4-1.el6 will be installed
---> Package libcom_err-devel.x86_64 0:1.41.12-7.el6 will be installed
---> Package libselinux-devel.x86_64 0:2.0.94-5.el6 will be installed
--> Processing Dependency: libsepol-devel >= 2.0.32-1 for package: libselinux-devel-2.0.94-5.el6.x86_64
--> Processing Dependency: pkgconfig(libsepol) for package: libselinux-devel-2.0.94-5.el6.x86_64
--> Running transaction check
---> Package libsepol-devel.x86_64 0:2.0.41-3.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================================
 Package                              Arch                 Version                           Repository                                 Size
=============================================================================================================================================
Installing:
 openssl-devel                        x86_64               1.0.0-10.el6_1.5                  rhel-6Server-x86_64-updates               1.1 M
 perl-Compress-Raw-Zlib               x86_64               2.023-119.el6_1.1                 rhel-6Server-x86_64-updates                67 k
 perl-Compress-Zlib                   x86_64               2.020-119.el6_1.1                 rhel-6Server-x86_64-updates                43 k
Installing for dependencies:
 keyutils-libs-devel                  x86_64               1.4-1.el6                         rhel-6Server-x86_64-updates                28 k
 krb5-devel                           x86_64               1.9-9.el6_1.2                     rhel-6Server-x86_64-updates               1.2 M
 libcom_err-devel                     x86_64               1.41.12-7.el6                     rhel-6Server-x86_64-updates                30 k
 libselinux-devel                     x86_64               2.0.94-5.el6                      rhel-6Server-x86_64-updates               135 k
 libsepol-devel                       x86_64               2.0.41-3.el6                      rhel-6Server-x86_64-updates                64 k
 perl-IO-Compress-Base                x86_64               2.020-119.el6_1.1                 rhel-6Server-x86_64-updates                66 k
 perl-IO-Compress-Zlib                x86_64               2.020-119.el6_1.1                 rhel-6Server-x86_64-updates               133 k
 pkgconfig                            x86_64               1:0.23-9.1.el6                    rhel-6Server-x86_64-updates                70 k
 zlib-devel                           x86_64               1.2.3-25.el6                      rhel-6Server-x86_64-updates                43 k

Transaction Summary
=============================================================================================================================================
Install      12 Package(s)

Total download size: 3.0 M
Installed size: 6.4 M
Is this ok [y/N]: 

Unzip VMware-*.tar.gz and run :
./vmware-install.pl

Accept the terms. (yes, this is needed).

Then you should get to this :

The installation of VMware VIPerl Toolkit 1.6.0 build-104313 for Linux
completed successfully. You can decide to remove this software from your system
at any time by invoking the following command:
"/usr/bin/vmware-uninstall-viperl.pl".

Make sure you add “virtual machine administrator” permissions to “vmware_fence_account”, for the VM’s it needs to stonith.

From now on, you should be able to stonith VM’s.

Here’s a working RHCS config (/etc/cluster/cluster.conf) :

[sourcecode language="xml"]
<?xml version="1.0"?>
<cluster alias="ServiceClusterTEST" config_version="4" name="ServiceTEST">
<totem token="45000"/>
<fence_daemon post_fail_delay="0" post_join_delay="3"/> <fence_daemon clean_start="1"/>
<clusternodes>
<clusternode name="node01.intranet.example.org" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node01"/>
</method>
</fence>
</clusternode>
<clusternode name="node02.intranet.example.org" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node02"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_vmware" ipaddr="esx.intranet.example.org" login="vmware_fence_account" passwd="password" name="node01" port="node01"/>
<fencedevice agent="fence_vmware" ipaddr="esx.intranet.example.org" login="vmware_fence_account" passwd="password" name="node02" port="node02"/>
</fencedevices>
<rm>
<resources>
[whatever resources you have]
</resources>
<service name="Service" autostart="1">
[whatever services the cluster is in charge of]
</service>
</rm>
</cluster>
[/sourcecode]