Configuring Lots of OTP Nodes Using Open Source Tools September, 2004 Hal Snyder Contents * Goals * CVS * Autoconf * Pkgsrc * Cfengine * Benefits * Drawbacks/Challenges * Conclusion Goals * Divide servers into "classes" based on what they do. Keep all software on servers in the same class at identical revision state. a. at time of installation (jumpstart, ghost, g4u, etc.) b. as updates occur * Rebuild a server in less than an hour in case of hardware failure. * Have complete and unambiguous record of the configuration state of every server now and in the past, with accountability for each configuration change. * Keep programmers off the production servers. CVS * Gives you version control, but that is only part of configuration management. * CVS is the foundation of the CM system. Keep software under development in it, as well as autoconf macros, packaging files, and cfengine policy files. * We chose CVS because, although there are newer version control systems available, CVS is still used for the overwhelming majority of open source software. Interfaces and limitations well understood. Autoconf * Our main use so far is for configuring builds, not portability. * Based on decades of grueling CM automation work. * Well known. * Write a few OTP-specfic m4 macros for finding OTP libs and headers. * Allow engineers to specify during build whether they want release or their own test versions of each component in the build. Pkgsrc * Highly portable packaging system based on very mature systems used with FreeBSD, NetBSD, and OpenBSD. * Maintains database of installed packages, lots of support tools. * Works very well with autoconf'd software. * Use with (almost) all installed software after initial OS load. Cfengine * Put servers into various classes. A server pulls all updates for its class once per hour. * If package X is not installed, install it. * Replicate config files, prompts (often these are not in a package release cycle). * Edit shared config on the servers: /etc/inittab, crontabs, passwd. * Certain "dangerous" options require manual intervention on target server. Examples: change telco routing tables, start SIP proxy. Benefits * Convergence - all server in same class * Replace any server within less than 1 hour in case of hardware failure. * Complete documentation of configuration of the platform. * Give you a workable set of tools and processes for keeping * programmers off the production servers. * Full source for CM system. No licensing headaches. * About as light-weight as you can get for what it does. Drawbacks/Challenges * Learning curve for CM staff. * Culture shock for engineers and bosses. * Process for package updates needs to be simplified. * Pkgsrc supports only one installed version of a package at a time - not a show-stopper for OTP. * Cfengine file replication is slow. Consider rsync. * Cfengine has bugs and limited syntax. To Do * Integrate with OTP release handler. * Use autoconf more for portability. * Adapt legacy code to autoconf build system. * At present, the CM system is used almost exclusively for static configuration. Expand its use to allow runtime configuration of software we control. * Support MS operating systems. (Interix?) Conclusion: We have come a long way toward keeping our sanity while maintaining hundreds of Unix-family servers. The CM system is already working well, but is still a work in progress. Reference: Rationale for the system described above is presented here: http://www.drxyzzy.org/cm.notes.txt