Storing Configuration and the Unit of Configuration

Storing Configuration and the Unit of Configuration
	A Configuration Repository idea

If our purpose is to recreate a system from scratch with only its hardware and its configuration, we must devise a way to store any possible configuration, not only that which consists of files, but also the configuration implicit in e.g. hardware choices, and the way devices are interconnected. As configuration consists of choices made, we must at least be able to store all the possible choices, with the reasons for each. But there are some cases where we choose not to choose, and that still is a choice, which must be stored. Let's try as an exercise for the mind to configure a four-machine team of web servers.

Choice of hardware can be achieved by listing name and type of parts (e.g. An ASUS P5M2-E/4L motherboard, a Quad-Core Intel Xeon 3000 processor, 16 GB of RAM, etc etc), and interconnectivity can be described by listing the connectors of each device, and listing which is connected to which. Some of the hardware is chosen for a reason, e.g. availability of a certain part, or performance. If so, it makes sense to store the reason with the configuration, or even as part of it. It makes sense to me to store the configuration of one machine as a unit, and making clear that the other three have the same configuration as the first one. When that is clear, we can mention that the four are in some way connected by network wires.

The choice of OS can be simply from a list, but here we encounter the first interdependency: some OSes are not available for some hardware, or have no drivers for them. So we must refer back to choices made earlier, in order to explain the ones made here. Kernels to be booted have names, and perhaps paths. Boot parameters are just strings.

But we do get to a fundamental choice here. Some OSes have default boot parameters. We could decide to store these default boot parameters with our configuration, or we could store just what we added and what we removed. If the default has changed by the time we try to reproduce our system, we may get different results from what we got the first time if we store only the difference. So it is probably best to store the default, mark it as default (the distributor's default, not ours), and store what we finally decided was best as well.

Then the partitioning. We could store sizes as absolutes, as percentages of disk size, as combinations of these (minimum, percentage, maximum), or we could refer to an entire algorithm of computing the partition sizes. Network devices are a bit tricky -as are disks- in that they don't always appear with the same name, even across reboots of a single system. So there must be some way of circumscribing a device, e.g. “the disk that has tag so-and-so”, “the networking device with MAC-address 00:11:22:33:44:55”, “any disk with more than 20GB on it, which has space for another virtual ext3 fs of at least 700MB” or “any network device that can reach www.google.com”. ^[9].

As for the list of software installed, there is software that we explicitly need, which must be listed, and there is software that tags along, which must also be listed, just in case we were unknowingly using it and depending on it anyway.

Now the machine is running, and we start to configure in earnest. First of all, as these are public web servers, with Debian Linux and Apache, we enable IPtables and set it to drop all traffic except on port 80 and port 22, and to allow only two specific IP numbers on port 22. This piece of configuration depends on the fact that IPtables is available and installed. It also depends on the names of the network interfaces. Furthermore, it makes not all that much sense if no HTTP or SSH server is listening, but it doesn't do any harm either. On the other hand, it does harm things if any other service were to function on this machine, since that would be rendered useless. All ethernet packages to it would be dropped. This information needs to be stored. Then the actual configuration is done in several files. With a GUI, an IPtables script is created, and it is stored in /etc/firewall. Another script, /etc/init.d/iptables, is created that uses the former to turn on or turn off firewalling. Due to the names and locations of these files, the configuration is fit only for Debian, or more precisely, for distributions that have their initscripts in /etc/init.d and don't have their own files in /etc/firewall.

Resuming, we can state that “configuration consists of the choices made during the installation of a system”. With the entire configuration available, we can buy new hardware (if available) and bring up a new system that is for our purposes identical to the old one when that was installed. State is not a form of configuration. Items of configuration come with a reason, which should be stored with the configuration. They usually come with a default, which should also be stored. Stretches of configuration that exist for the same reason can be grouped together, but stretches of config that exist for different reasons must be split. It must be possible for different stretches of config that are spread over separate sections of a file, over different files or directories, or even across different systems, to be combined in a larger package.

^[9] This cries for plugins, but they will have dependencies of their own. Figuring out the size of a disk works differently under Debian than under Windows Vista.


What is not configuration?		Parametrization and its implications