ORC Owl Logo 2  

Owl River Company

  Your IP is:

Up More Tips

Broken System quick diagnosis process


We get a lot of inquiries (well, really, we get complaints) from new Linux admin's using CentOS. They are surprised that a casual approach toward administering their shiny new Linux system has not turned out too well. It may be that they are wholly new to the responsibilities of being accountable for their box, and untutored in the ways of the wily BOFH; they may be from another non-RPM or non-Red Hat derived distribution background; they may be from big iron Unix. In the interest of documenting a set of diagnostic steps for self-help (and so we have a writeup to point them at), we add this tip.

Common Complaints:
  1. I built something as root on my host, and now my system is acting funny [Next time: Do not build as root ever; see: Building as non-root; Set up a end user build directory]
  2. A co-worker with who I co-administer a host with did something, and I cannot tell what is broken [Next time: Use sudo and don't co-administer with people who are not willing to 'play fair' and use it as well, to leave a trail of what was done]
  3. I built a package from a tarball for my system, instead of using RPM; now I want to uninstall it, but I do not know what has been changed [Next time: Use rpm. Package and build it as non-root; see: Creating Quality RPMs]
Reality check time
The root cause broken packages or files, underlying this kind of 'muck up' is easier to diagnose quickly than it was in years past; good old rpm, and the newer yum utilities provide the tools we need.

Also, please note that there is just no real substitute for periodic level zero backups, duly rotated, and possibly supplemented by incrementals, maintained off the unit in question, against eventual need.

Limitation: Please note that the use of these tools outlined herein is not a substitute for careful forensic level analysis, with trip-wire type checksums, when one has suspicions that a host had been 'cracked'. When one has a 'cracked' system, there is no safe way to recover other than a 'wipe and reinstall'. Aren't you glad you have those backups. ;)

Threshold matters - yum seems to not be working
yum is a wonderful tool, for those of us who remember all the complaints about "rpm dependency hell", because with consistent upstream archives, one can usually work a system out of a funk. But yum itself sometimes gets into an inconsistent state, and needs a little clean-out, at the expense of having to re-download package header information.

To give yum that assist, run:
$ sudo yum -y clean all

.. and it will then be happier, sometimes. Also, the -d and -e debugging and error reporting defaults sometimes can be dialed up (i.e., yum -y -d 6 -e 6 update) when there is an issue to see why yum is not working as one expects. Unfortunately, as yum delegates much to subordinate python modules, the results here are uneven, as one cannot 'see' broken http proxy settings, or stale transparent cache induced errors, even using the -d and -e reporting levels.

Sometimes, people think they are running with 'stock' yum settings, but someone has altered them (see Common Complaint, #2, above). Rule out this possibility thus:
$ sudo rpm -V yum centos-release

Sometimes people think yum is broken, when nothing (beside their expectation as to the tools' behavior) is wrong at all. Removing, reinstalling, and then optionally re-removing a small package lets us see if yum is working, and we are not paying attention closely enough:
$ sudo rpm -e joe && sudo yum -y install joe && sudo rpm -e joe

We are also interested in impossible installations situations, such as a package being in the RPM database twice under a a single packagename.arch pair:
$ rpm -qa --qf '%{name}.%{arch}\n' | grep -v ^kernel | \
	sort | uniq -c  | awk {'print $1" "$2'} | grep -v ^1   

There are other subtle error causing matters (full or RO partitions, no free inodes, and such), but these are general and beyond the scope of this discussion.

Express method: yum tool
First, let's make sure the system is a consistent install state, as to its component packages as known to the RPM database. This was formerly rather laborious and time-consuming to run, requiring rather elaborate rpm shell scripts. Now a adjunct tool to yum makes it simpler. Run:
$ sudo package-cleanup --problems
If the system returns a message like this:
-bash: package-cleanup: command not found
this means that the tool is not yet installed. To install it, run:
$ sudo yum -y install yum-utils
Then retry the command, and if the install succeeded the package-cleanup command should now work.

If not available for your distribution, it can trivially be built, non-root, with the tarball from CVS.

In interpreting output from the tool, it will list those missing files, packages, and so forth, which it can determine are needed to bring the system's packages into a complete and consistent state within the RPM database's view of Requirements and Dependencies. People who have manually compiled, copied, or otherwise created files unknown to the RPM database often wonder why the system is unhappy because they 'know' that all is all right. The packaging and install discipline of RPM require a person making such an assertion to demonstrate the correction of their assumptions, and to let the rpm-build package's 'rpmbuild' tools determine more subtle dependencies such as library version and SONAME issues.

The solution is simple: Have yum install the missing dependencies it can, and see if it cannot solve them all from the repositories known to it. If it can, great; all done.

If not, one has to mount a campaign to partition the unsolved dependencies into solvable parts, and to either locate archives proper to the distribution level you are running with consistent packages already build in binary form, or get busy building from sources, possibly needing to do some SRPM build effort steps -- locating and verifying sources; setting up a proper build environment; preparing a .spec file, and preparing patches. We address this process in more detail elsewhere.

Slow but steady: rpm -V
The other way to address the matter, is to have RPM walk through all of its packages, and Verify that the component elements are not tampered with. A scriptlet like this will produce useful output. The problem is that it check so much that it is dog slow. We use tee in an appending mode, so that an impatient admin does not give up in the process. Run this, using sudo, as root.

#!/bin/sh # run this: sudo ./test.sh assuming it is named 'test.sh' # Copyright (c) 2006 Owl River Company # reports to: info@owlriver.com # License: GPL, v. 2 YMD=`date +%Y%m%d` > /tmp/verification-${YMD}.txt for i in `rpm -qa --qf '%{name}-%{version}-%{release}\n' | sort`; do echo "$i" | tee -a /tmp/verification-${YMD}.txt rpm -V $i | tee -a /tmp/verification-${YMD}.txt echo " " | tee -a /tmp/verification-${YMD}.txt done
Perhaps obviously, but there are some 'backticked' sub-shell commands in that example. To get this onto a hurt machine, we usually scrape the contents of the preceeding blue box into a copy buffer, and start a:
$ vi test.sh
on the sick box. We then type the letter: A (to Append to the empty file at the EOL of the current insertion point) and paste the copy buffer. Then type exactly:
Now make it executable:
$ chmod 755 test.sh
and run it as outlined in the body of the script. This script is friendly to producing versioned verifications -- some admins mount /tmp in the older style, where it is emptied at each reboot -- save the output elsewhere if you want a time series to review.

The mere appearance of a file in this 'verification' listing is not necessarily a bad sign, and a study of the 'flag' fields in man rpm is in order, to understand what is being verified.

We make this available for non-commercial and individual use. Please respect our copyright, and consider contacting us for all your Open Source and *nix design, architect / systems analysis, and administration needs.

Copyright (C) 2006 R P Herrold
   My words are not deathless prose,
      but they are mine.

       Owl River Company
   "The World is Open to Linux (tm)"
   ... Open Source LINUX solutions ...
         Columbus, OH


Up More Tips

Back to Top Page
[legal] [ no spam policy ] [ Copyright] © 2008 Owl River Company
All rights reserved.

Last modified: Tue, 10 Feb 2009 09:24:36 -0500