| |
 |
|
Owl River Company
|
|
|
|
Your IP is: 38.103.63.16
|
Up More Tips
Broken System quick diagnosis process
http://www.owlriver.com/tips/broken-system/
We get a lot of inquiries (well, really, we get complaints)
from new Linux admin's using CentOS. They are surprised that a casual
approach toward administering their shiny new Linux system has not turned
out too well. It may be that they are wholly new to the responsibilities of
being accountable for their box, and untutored in the ways of the wily BOFH;
they may be from another non-RPM or non-Red Hat derived
distribution background; they may be from big iron Unix. In the interest of
documenting a set of diagnostic steps
for self-help (and so we have a writeup to point them at), we add this
tip.
Common Complaints:
- I built something as root on my host, and now my system is
acting funny [Next time: Do not build as root ever; see:
Building as
non-root; Set up a end user build directory]
- A co-worker with who I co-administer a host with did something, and
I cannot tell what is broken [Next time: Use sudo and don't
co-administer with people who are not willing to 'play fair' and use
it as well, to leave a trail of what was done]
- I built a package from a tarball for my system, instead of using RPM;
now I want to uninstall it, but I do not know what has been
changed [Next time: Use rpm. Package and build it as non-root; see:
Creating
Quality RPMs]
Reality check time
The root cause broken packages or files, underlying this
kind of 'muck up' is easier to diagnose quickly than it was in years
past; good old rpm, and the newer
yum
utilities provide the tools we need.
Also, please note that there is just no real substitute for periodic
level zero backups, duly rotated, and possibly supplemented by incrementals,
maintained off the unit in question, against eventual need.
Limitation: Please note that
the use of these tools outlined herein is not a substitute
for careful forensic level analysis, with trip-wire type checksums, when one
has suspicions that a host had been 'cracked'. When one has a 'cracked'
system, there is no safe way to recover other than a 'wipe and reinstall'.
Aren't you glad you have those backups. ;)
Threshold matters - yum seems to not be working
yum
is a wonderful tool, for those of us who remember all the
complaints about "rpm dependency hell", because with consistent upstream
archives, one can usually work a system out of a funk. But yum
itself sometimes gets into an inconsistent state,
and needs a little clean-out, at
the expense of having to re-download package header information.
To give yum that assist, run:
.. and it will then be happier, sometimes. Also, the -d and -e debugging
and error reporting defaults sometimes can be dialed up (i.e.,
yum -y -d 6 -e 6 update) when there is an
issue to see why yum is not working as one expects.
Unfortunately, as
yum delegates much to subordinate python modules, the results here
are uneven, as one cannot 'see' broken http proxy settings, or stale
transparent cache induced errors, even using the -d and -e reporting
levels.
Sometimes, people think they are running with 'stock' yum settings, but
someone has altered them (see Common Complaint, #2, above).
Rule out this possibility thus:
$ sudo rpm -V yum centos-release
|
Sometimes people think yum is broken, when nothing (beside their
expectation as to the tools' behavior) is wrong at all.
Removing, reinstalling, and then optionally re-removing a small package lets
us see if yum is working, and we are not paying attention closely
enough:
$ sudo rpm -e joe && sudo yum -y install joe && sudo rpm -e joe
|
There are other subtle error causing matters (full or RO partitions,
no free inodes, and such), but these are general and beyond the scope of
this discussion.
Express method: yum tool
First, let's make sure the system is a consistent install state, as to its
component packages as known to the RPM database. This was formerly
rather laborious and time-consuming to run, requiring rather elaborate rpm
shell scripts. Now a adjunct tool to yum makes it simpler. Run:
$ sudo package-cleanup --problems
|
If the system returns a message like this:
-bash: package-cleanup: command not found
|
this means that the tool is not yet installed. To install it, run:
$ sudo yum -y install yum-utils
|
Then retry the command, and if the install succeeded the package-cleanup command should now work.
If not available for your distribution, it can trivially be
built, non-root, with the tarball from CVS.
In interpreting output from the tool, it will list those missing
files, packages, and so forth, which it can determine are needed to bring
the system's packages into a complete and consistent state within the
RPM database's view of Requirements and Dependencies.
People who have manually compiled, copied, or otherwise created files
unknown to the RPM database often wonder why the system is unhappy because
they 'know' that all is all right. The
packaging and install discipline of RPM require a person making such an
assertion to demonstrate the correction of their assumptions, and to let the
rpm-build package's 'rpmbuild' tools determine more subtle
dependencies such as library version and SONAME issues.
The solution is simple: Have yum install the missing dependencies
it can, and see if it cannot solve them all from the repositories know to
it. If it can, great; all done.
If not, one has to mount a campaign to
partition the unsolved dependencies into solvable parts, and to either
locate archives proper to the distribution level you are running with
consistent packages already build in binary form, or get busy
building from sources, possibly needing to do some SRPM build effort steps
-- locating and verifying sources; setting up a proper build environment;
preparing a .spec file, and preparing
patches. We address this process in more detail elsewhere.
Slow but steady: rpm -V
The other way to address the matter, is to have RPM walk through all of its
packages, and Verify that the component elements are not tampered with. A
scriptlet like this will produce useful output. The problem is that it
check so much that it is dog slow. We use tee in an appending
mode, so that an impatient admin does not give up in the process.
Run this, using sudo, as root.
|
#!/bin/sh
# run this: sudo ./test.sh assuming it is named 'test.sh'
# Copyright (c) 2006 Owl River Company
# reports to: info@owlriver.com
# License: GPL, v. 2
YMD=`date +%Y%m%d`
> /tmp/verification-${YMD}.txt
for i in `rpm -qa --qf '%{name}-%{version}-%{release}\n' | sort`; do
echo "$i" | tee -a /tmp/verification-${YMD}.txt
rpm -V $i | tee -a /tmp/verification-${YMD}.txt
echo " " | tee -a /tmp/verification-${YMD}.txt
done
|
Perhaps obviously, but there are some 'backticked' sub-shell
commands in that example. To get this onto a hurt machine, we usually
scrape the contents of the preceeding blue box into a copy buffer,
and start a:
on the sick box. We then type the letter: A (to Append to the empty file
at the EOL of the current insertion point) and paste the copy buffer. Then
type exactly:
Now make it executable:
and run it as outlined in the body of the script.
This script is friendly to producing versioned verifications --
some admins mount /tmp in the older style, where it is
emptied at each reboot -- save the output elsewhere if you want
a time series to review.
The mere appearance of a file in this 'verification' listing is
not necessarily a bad sign, and a study of the 'flag' fields in
man rpm is in order, to
understand what is being verified.
We make this available for non-commercial and individual use.
Please respect our copyright, and consider contacting us for
all your Open Source and *nix design, architect / systems analysis, and
administration needs.
Copyright (C) 2006 R P Herrold
herrold@owlriver.com
My words are not deathless prose,
but they are mine.
Owl River Company
"The World is Open to Linux (tm)"
... Open Source LINUX solutions ...
info@owlriver.com
Columbus, OH
http://www.owlriver.com/tips/broken-system/
Up More Tips
Back to Top Page
Last modified: Wed, 17 Oct 2007 22:11:43 -0400
http://www.owlriver.com/tips/broken-system/index.php