COLUMBIA
ACCIDENT INVESTIGATION BOARD
PUBLIC HEARING
WEDNESDAY, APRIL 23, 2003
1:00 p.m.
Hilton Hotel
3000 NASA Road 1
Houston, Texas
BOARD MEMBERS PRESENT:
Admiral Hal Gehman
Rear Admiral Stephen Turcotte
Major General John Barry
Major General Ken Hess
Dr. John Logsdon
Dr. Sheila Widnall
Mr. G. Scott Hubbard
Mr. Steven Wallace
WITNESSES TESTIFYING:
Dr. Jean Gebman
Mr. Robert P. Ernst
Dr. Diane Vaughan
ADM. GEHMAN: Good afternoon. The afternoon session
of the Columbia Accident Investigation Board public hearing
is in session. This afternoon we're going to hear from two
experts on the subject of aircraft aging, which is another
risk element in the shuttle program which wasn't originally
foreseen -- at least I don't think it was. The shuttles
were originally designed to last ten years and now we're
passing 20 and headed toward 30 and the shuttle vehicle
then is facing issues which need to be looked at to determine
whether or not the shuttle can operate safely. We're very
pleased to have you two gentlemen join us.
Dr. Jean Gebman is a senior engineer at the Rand Corporation;
and Mr. Robert Ernst is the head of the Aging Aircraft Program
at the Naval Air Systems Command, Patuxent River. We're
glad to have you both with us.
I would invite you to introduce yourselves and say a little
bit about your present job and your background; and then
if you have an opening statement or a presentation, please
go ahead and proceed. Why don't you both introduce yourselves
first, and then we'll go ahead with the presentation.
JEAN GEBMAN and ROBERT ERNST testified as follows:
DR. GEBMAN: I'm Jean Gebman, senior engineer at Rand,
working on the aging Aircraft Project. My educational background
is in aerospace. My doctoral work majored in structural
dynamics with minors in fluids and control engineering.
MR. ERNST: I'm Bob Ernst, the head of the Nav Air
Aging Aircraft Program and also representing the Joint Council
on Aging Aircraft which is a DOD, FAA, NASA, and industry
consortium trying to work on age issues. I don't have the
storied credentials and degrees that my counterpart has,
but I've got a lot of years of experience working on old
platforms and rust and corrosion and obsolescence and those
sorts of things.
ADM. GEHMAN: Thank you very much. Go ahead and proceed.
DR. GEBMAN: Thank you, Mr. Chair. Bob and I are going
to present two briefings that are very complementary. I'm
going to talk about some technical details to give you a
somewhat hurried landscape technically, and then Bob's presentation
is going to deal with some of the cultural and programmatic
matters.
Next chart, please. This is simply a bit of background.
In the interest of time, we'll just press on ahead. Next
chart, please.
The examples that I've selected do have a methodology behind
them, and this chart is an attempt to try to capture the
essence of that. We're going to focus on the top set of
items, although aging aircraft do involve all of the functional
areas that are listed on the left-hand side of the chart.
Next chart, please. So this is going to be the focus.
Next chart. Whether or not this focus proves helpful to
you is, of course, a matter to be determined as your investigation
moves forward. So my purpose here today is more to share
with you some areas where the aging aircraft experience
might prove helpful as you move down the road.
Next chart, please. You all have seen the various diagrams
of the shuttle. I'm going to focus on the left side.
Next chart. And simply make a couple of points. We have
four main spars that go through; and when we talk about
structures and structural dynamics, one of the things we
often quickly look at is the wing route where the spars
go through. That's just simply one area that one is always
interested in.
Next chart. Another area that's of interest and will be
touched on by one of my examples subsequently has to do
with the aluminum honeycomb. This is simply a cross-section
showing at the top there the interior face sheet, which
is aluminum; the corrugation, which is aluminum; and the
piece of bond between the corrugation and the exterior face
sheet; and then, of course, the thermal protection system
underneath. A very sophisticated system. And one of the
things we will be talking about later is the matter of adhesion
as a method of joining structural materials together.
Next chart, please. This is a list of the samplers. Let's
get right to it.
Next chart. B-52 is a very interesting story. This often
is pointed to as here is why it is possible to maintain
a fleet for a very long period of time. We need, though,
to be cautious and acknowledge how it was we got to that
situation, because you may note that the G model and the
D model have long since gone to the boneyard. Corrosion
was the principal culprit. The basing at Guam was about
the worst base you could be at for an Air Force aircraft
from a corrosion standpoint.
Next chart. Even the H model, to get it to where it is today,
has been significantly rebuilt in many areas, as these various
shaded areas demonstrate. Moreover, it has been based at
a location that is relatively benign from a corrosion hazard
standpoint and the maintenance people learned a good lesson
from the experience of the G model and there has literally
been a zero tolerance for corrosion. If they see corrosion,
it must be removed.
When we visited the depo about six years ago, we looked
B-52 and the KC-135s. I was challenging the technicians
on the B-52, "Show me the corrosion."
They said, "Dr. Gebman, there is none."
I said, "Folks, it's an old airplane. We know there must
be corrosion."
Finally, they were able to show me a detail at the back
of the airplane and they acknowledged, well, we ground out
a little bit back here but this is not even significant.
This airplane is very different from the 135. Next chart,
please.
ADM. GEHMAN: Could I ask you to go back a second.
In that first bullet, what is a full-scale fatigue test,
what's a damage tolerance analysis, and what's a tear-down
inspection?
THE WITNESS: The full-scale fatigue test is where
you take an article that could be flown in flight and, instead
of doing that, you set it up to be loaded cyclically by
attaching various jacks and an enormous hydraulic contraption
and typically you will try to simulate two -- in the old
days, four -- equivalent lifetimes that identify where the
fatigue vulnerabilities are so that they can be addressed
during production and/or during maintenance.
ADM. GEHMAN: And I assume also recognize -- I mean,
in other words if you have a fatigue indicator like a crack
or something like that, the idea is that you would then
be able to recognize that if that were to happen in a service
vehicle.
DR. GEBMAN: One of the most important things you
learn from the test is where the cracks are taking place
and so that you can set up a maintenance program or do a
modification so you don't have to set up a maintenance program.
The damage tolerance analysis is a method of studying the
growth of fatigue cracks and their significance, giving
you further information that you use for fleet management
and modification purposes.
The tear-down inspection took place in the 1990s, largely
to identify places with corrosion was going on in areas
that could not otherwise be seen. When we do heavy maintenance,
we don't take the airplane totally apart. The notion of
a tear-down inspection is to take a high-time airplane which
you're prepared to sacrifice and literally take every part,
open it up, and see where you have challenges.
MR. WALLACE: Is the concept of damage tolerance that
you will be able to detect cracks and things and also make
predictions as to their growth rates in such a way that
you can easily detect them before they become critical?
DR. GEBMAN: Yes. And I would encourage, if I might,
that we try to speed through the examples because you will
have an opportunity to see illustrations of some of these
specific points.
With the board's permission. Next chart, please. Moving
on to the 135, corrosion is the principal challenge with
that fleet.
Next chart. This is an example of a tear-down inspection.
What you're looking at is a drawing of the top view of the
full fuselage. Each square is an area that they took the
structure apart, opened it up, looked at it sometimes under
a microscope. If you see color in the square, it means they
found at least light corrosion present. Just about every
square that they did a detailed examination of, they found
some indications of corrosion with that fleet. That is a
result of the materials that were selected, the environment
in which it is operated, and the maintenance program which
it had through its lifetime.
Next chart, please. Similar view. This time it's the wing
structure.
Next chart, please. As a consequence, over time when these
airplanes go in for heavy maintenance now on a five-year
cycle, it can take a year to do the complete job.
Next chart, please. This chart shows declining labor hours
required. We are now at a point where the labor hours to
do that heavy work is eight times what it was the first
time it was done when the airplane was about eight years
old.
Next chart, please. Until very recently it was the Air Force's
intent to keep all KC-135s to the year 2040 or thereabouts,
at which point the fleet would be 80 years of age. Recently
the senior leadership has decided that the older airplanes,
the E models of which there are somewhat more than 100,
need to be retired sooner than that; and they are now looking
at leasing perhaps a 767 to fill this particular function.
So one's perspective about life can change significantly
as you learn more and more about the growing burdens before
you.
Next chart, please. Moving on now to a new decade. Next
chart. I share this example with you that illustrates some
of the complexities and depth and breadth of endeavor one
can get into when dealing with life issues. Now, the irony
is that this is dealing with the new C-5A in the early Seventies.
It had a very unfortunate experience in its full-scale fatigue
tests. Fatigue cracks throughout the airplane, especially
in the area of the wing.
The Air Force Scientific Advisory Panel convened a study
in 1970 for the Air Force, made some recommendations. The
following year, a major engineering effort was launch. Independent
review team. One hundred people worked for one year, going
through the results of the full scale fatigue tests, looking
at the different options that the Air Force might consider,
analyzing Options A through H, and presenting them to the
leadership. Ultimately Option H, wing redesign and replacement,
was selected. Once you open up the area of structures, the
number of things that you can end up having to examine can
be considerable. That's the lesson from this particular
example.
Next chart, please. This example is a little bit different.
We're focusing on a specific technical issue. It's honeycomb
composite material, and it proved, in those few areas where
it's used on the F-15, to be quite challenging.
Next chart. These are some of the methods in which the water
and the corrosion and cracking and durability issues arose
with that particular fleet. To the extent that this proves
of interest, the area of honeycomb composites, this particular
fleet -- and there are some other examples -- might be worth
looking at.
GEN. BARRY: One comment on that. This is also the
leading edge of a lot of the wing forms in the F-15s, particularly
in the tail as a point. So might be of interest in the board.
DR. GEBMAN: Yes, sir. Thank you.
Next chart. Moving on to the Seventies, here we have two
examples dealing with the loads that actually occurred,
exceeding what the designers thought they would be.
Next chart. This is a classic. The F-16 was designed for
both air-to-air and air-to-ground work; and it turned out
that in the air-to-ground mission area, the loads that the
structure encountered very quickly exceeded the capacity
of the structure as it was designed. This illustrates the
importance of really monitoring your loads through your
life cycle so that you take that load information and update
your expectations as regards fatigue cracking.
Next chart, please. This is the process. This is the durability
and damage tolerance analysis process and I'm certainly
not going to lecture on this today, but this is a summary
that you might find useful as your work moves forward. When
I look at this, I look at it from not only a structures
viewpoint but also from a systems viewpoint. You can literally
go through that chart and change its orientation from fatigue,
which it was designed for, to corrosion or other kinds of
things that affect an aircraft as it ages. Indeed, today
people are working on the development of what's called a
functional integrity program approach, which mirrors this
aircraft structural integrity kind of program.
Next chart, please. The B-1 example is a little bit different.
Here we were dealing with acoustic fatigue, which is a dynamic
phenomenon and it's a bit like the tuning fork. If you hit
the tuning fork, it will vibrate at a natural frequency.
Well, aircraft structures, if excited at their natural frequency,
will engage in vibration; and this can great accelerate
the propagation of fatigue cracks. That's the essence of
that particular story. It's an interesting one from you
all's perspective to an extent because it involved both
thermal, aerodynamic, and structural dynamics. It turns
out that the designers deliberately had hot exhaust from
the engines going over the control systems at low-speed
flight to increase the control authority of the control
surfaces.
Next chart, please. Now for our final example. Next chart.
This is an airplane that served quite long in terms of landings.
It was designed for 75,000, and in flight hours it was not
all that high. It was designed actually for 50,000. This
example illustrates the three things listed on the chart.
Next chart, please. Imagine yourself flying over the Pacific
in this particular airplane. You're in Row No. 5. You have
the seat next to the window, and over your left-hand shoulder
there's a fatigue crack. From the NTSB's excellent work,
it appears that the sequence we're going to talk about started
at the fastener hole indicated here. What's important to
focus on here is the length of the fatigue crack. The blue
is supposed to depict the sky. From the outside of the airplane
that crack was only a tenth of an inch long, and yet it
contributed to a sequence of events that we're going to
look through in the subsequent charts.
Next chart, please. Part of the problem is that it wasn't
just one crack at that fastener. There was one on the opposite
side, as well. It was only .11 inches. So from a detection
standpoint, this would have been a bit of a challenge to
be detected visually just from a casual walk-around kind
of inspection. From a fracture mechanics standpoint, though,
the crack is really a half inch long because when you look
at the stress intensity at the tip of the crack, what it
depends upon is that total length, that .53 inches. And
fatigue cracks, we now know, grow at a rate that is a function
of how long they are. So the longer the crack, the more
rabidly it will grow as that part of the structure goes
through its next cycle of loading up and down.
Next chart, please. Not only was Fastener Hole 5 cracked
on both sides but there were also adjoining fastener holes
numbered 3 through 9 that also had these kinds of cracks.
Next chart, please. Consequently, Fastener Holes 3 through
9 simply zipped across one afternoon when the loads hit
a particular level; and this particular sheet of middle
separated from its counterpart.
Next chart, please. The problem is -- and I must apologize,
this chart didn't quite make the translation from Macintosh
to PC the way I had hoped -- this chart is intended to illustrate
two pieces of skin with an adhesive material between the
skins. You see, the fasteners were never designed to carry
the load. The load was supposed to be carried by the adhesive.
The adhesive broke down. There was corrosion that took place.
So we have a combination of adhesion failure and corrosion
going on, destroying the primary joining mechanism. The
fasteners picked up the load, but cracks developed very
quickly because they really weren't intended to carry the
load for very long.
Next chart, please. The failure next was supposed to be
stopped by what's called a fail-safe strap. These are spaced
every couple of feet; but it also was glued, if you will,
to this skin. The glue had eroded over time. Corrosion was
taking place. So when the came load came zipping down to
the fail-safe strap, it too broke.
Next chart, please. Indeed, all of the fail-safe straps
broke between the two major bulkheads that define the boundaries
of this particular failure. Fortunately, there was only
one fatality, although there were a number of other injuries.
The silver lining to this particular cloud is it caught
the attention of the aerospace community, and since then
there have been a whole series of efforts that really were
stimulated by this and some subsequent events.
Next chart, please. One of the matters you all will be talking
about later, I think, might be somewhat related to this
chart. This was not a matter that was brand new in 1988.
The first signs of it were back in 1970, and the bullets
in this chart sort of trace some of that history.
Next chart, please. So in closing, two more charts. Next
chart. In looking back at the life cycle management of fleets
over time, there are some things that seem to serve us well,
and they're highlighted here. We talked about the durability
and damage tolerance analysis, the full-scale fatigue tests,
tear-down inspections, updating the damage tolerance analysis
with new loads data because loading environments change
over time with flight vehicles, and maintaining high levels
of system integrity.
Next chart, please. In closing, many fleets have flown way
beyond the traditional points of retirement. In studying
these flights, each seems to have its own unique story in
terms of the challenges it had and how those challenges
were dealt with. We hope, we at Rand on the Aging Aircraft
Team, that this quick survey of the landscape may prove
of some aid to the board as you continue your important
work.
Thank you.
ADM. GEHMAN: Thank you very much.
MR. ERNST: I'm hoping to see a slide here in a minute
that comes up.
I want to thank you for the opportunity to talk to you a
bit more about the cultural issues. Dr. Gebman and I compared
slides for the first time about two hours ago, and you'll
see some tie-ins to his slides that is more by coincidence
in our mutual experience by preplanned coordination.
One of the things I want to focus on is cultural, and it
goes back to part of the problems that I saw in Dr. McDonald's
Shuttle Independent Assessment Team back in 1999 and some
changes that I think need to be made in the aerospace industry.
Next slide, please. I also want to offer the apologies of
Colonel Mike Carpenter, my counterpart in the Air Force
Aging Aircraft Program, who was still stuck at Wright Patterson.
You'll see these slides we kind of do interchangeably on
here. This one's a little dated, but it shows the growth
of the age, the average age of our fleets over the last
10 or 12 years, most of it from the DOD side from a procurement
holiday. When you're talking about an aircraft reaching
20 years of age, that's an average age. You've got some
like the B-52 and the KC-135, H-46, they're getting up in
the late 30s.
We are in unprecedented areas in dealing with aging aircraft.
It's not like we can go back and find the predecessor of
the B-52 and see how it did in its forty-fifth year. There
isn't that data. As you can see from Dr. Gebman's presentation,
there are a lot of complex issues. I use the phrase, "This
isn't rocket science," but it really is a complex issue,
an age type of rocket science in there. Even though we have
a lot of very, very talented individuals working on these
issues, we're kind of a 1-of-1 type of scenario. We're out
in new areas in there.
I also want to show that the systems, even that are old,
it doesn't mean they can't be effective. I think all we
have to do is look to the recent aircraft performance in
Operation Iraqi Freedom to see that our legacy platforms,
when they're put in the hands of qualified operators and
maintainers that are dedicated to their jobs, can do a tremendous
job and do a great performance. But sometimes those aircraft,
when they get up in age, we have new issues that we have
to handle in there.
The challenge we need to do is balance when can we recapitalize.
There's no idiot light that just sits here and goes, ding,
"Replace this aircraft and buy new aircraft." We have to
look at a variety of factor, things such as fatigue tests,
tear-down inspections, load surveys, complex issues. And
frankly, they aren't very sexy. When you talk about I want
you to go study corrosion and rust propagation in aircraft,
that's not the thing that the young kid out of school necessarily
wants to focus on. So there's some challenges there.
Next slide, please. One of my other hats that I put on to
cover my bald head is part of the Joint Council on Aging
Aircraft. I wanted to explain a little bit about this. This
was a grassroots group that got together a little less than
two years ago because we all realized in the Air Force and
the Navy and the Army and Coast Guard and DLA and NASA that
we did not have enough resources. You can read resources
as people, money, and time to be able to handle all the
issues adequately but we said, you know, we're taxpayers
and every April 15th I look at my tax statement and say,
gee, I'd like to see if I can reduce that tax burden somehow.
So we decided to cooperate and graduate and see if we could
share things together and work together on certain issues
in here. This group met in about August of 2001, the Joint
Aeronautical Commanders group said, "Hey, what are you doing
on aging? Let's get together and formally charter this group."
Next slide, please. So if you know anything about the Joint
Aeronautical Commanders Group, the service three stars,
at the systems command they report to the Joint Logistics
Commanders group in there. They have a series of boards,
and we were adopted by them and became one of their boards.
Click it again for me and bring up my next pretty picture.
There's the people we have from the leadership of the different
aging aircraft communities. And we are a board and what
we're trying to do now is bringing the attention of aging
aircraft issues up to the other members of the board and
to try to get things changed.
For example, training. We went around and we found out that
sometimes our maintenance training wasn't up to snuff in
some areas. So we went back and said, "Hey, how does that
training curriculum that was done when the S-3 that Admiral
Turcotte flew was delivered in 1974, how should that change?"
And we went through and looked at seeing some of those things
because aging is going to change some of your core functions
and logistics and engineering and supply support and those
issues and our job is to bring focus to those.
Click it again for me, please. Next slide, please. What
is the mission of the JCA? Twofold really. One is to identify
and investigate issues. But we're not just a think tank.
We're not going to put a pretty little report that says
you really need to go, you know, build this or you need
to do this. We're also serving as program managers that
are fielding products, especially in the transition area,
taking a lot of the new technologies that are out there
and look really good, putting them on aircraft and making
sure what application they work. That's our focus. And that's
one of the biggest pitfalls we have on an aging side is
taking all that really neat stuff out there, all those science
fair projects, and putting them on platforms.
Next slide, please. Ironically, I sat in the airplane late
last night and said what are some of the characteristics
of a robust, good successful program; and you'll see a lot
of similarities to what Dr. Gebman presented. The first
thing we have to do is understand how all of the components,
whether it be an O-ring, a structure, an ejection seat in
a fighter aircraft, whatever you need, how does that age.
If you look at the way we classically develop air vehicles,
we spend a lot of time focusing on the development side,
getting it up to initial operational capability, and then
we've qualified all those issues, they're good, we just
kind of do some monitoring of our data but we really don't
know all the interdependencies of all those different materials
and how they age as a function of time, how they age as
a function of changes in environmental regulations, how
the load changes, the pilots are going to fly the airplanes
differently. We have mission changes on there and we now
want to be able to do this or do this or drop this bomb.
You can look at all the views of the airplane over time
and see the mission changes. So we have to understand how
each of those subsystems are effective in the system of
systems.
The next thing is monitoring our fleet usage data. You give
a pilot an aircraft, and he's going to find unique ways
to be able to fly that airplane in an environment, especially
with new mission growths that we've got to counter. The
way you do a fatigue test is you go and you estimate how
many 1G, 1 1/2, 2G maneuvers, how many landings, how many
takeoffs, how many pressurization cycles, and you put it
all in there and you literally, you know, bend this thing
like it's a piece of silly putty to see where it cracks
going in. But you're guessing how that airplane is going
to be used 20 and 25 years in advance. And one of the changes
that we've seen is we need to go and monitor that fleet
usage, collect that data, and then update that fatigue testing
because, you know, I guarantee things are going to be different
ten years from now, just as they were ten years ago.
You need to utilize that fleet data to go back and not just
collect it in some big data morgue but go back and say:
How's your original calculations? Are you using up your
service life earlier? You know, the Navy went and bought
some F-16s for their adversary squadrons, and we used them
up in about four years because they were all doing the shooting
down their watch type of stuff very quickly in there. The
mission changes, the requirements change, and we have to
be able to make sure our original predictions -- they're
not wrong, but they've got to be validated. It's kind of
like me taking my two thumbs and going like this and saying,
yeah, I can figure out and calculate how I'm going to go
to the moon. You've got a lot of mid-course corrections
you have to do.
The last issue which was brought up before, I found it amusing
to hear the previous panel talk about the daily report systems
in PRACA. We need to collect good data, but we need to have
that data resident at the subject matter expert's fingertips,
not in some type of huge data base in the sky that nobody
can get to. And all those elements need to be in there.
It's more than just neat technology. You have to have all
these elements and, folks, this ain't sexy but this is the
core that allows you to manage a fleet effectively.
Next slide, please. The Joint Council on Aging Aircraft,
working together, try to run their own programs and share
this data together, is trying to make process recommendations
and not just field issues. Microcircuit obsolescence was
brought up today. What data do we need to buy in our acquisition
programs to make sure that we can support the rapid changeover
in technology, because we're not going to drive it in Department
of Defense or NASA anymore. When we have to get with the
industry and figure out what data we need, what's the best
approach, that's going to require some acquisition changes,
some process changes -- again, not just technology -- but
yet we will take those technologies, evaluate them, and
say these are the ones we need to select.
I once told a group that I was walking along the beach and
picked up a pretty seashell and out ran three guys selling
corrosion solutions. I mean, there literally are hundreds
of technologies; and I think I broke my corrosion lead's
pencil when he got up to about 84 different areas. I said
let's get six out there and be successful. We like good
ideas. That's what fuels the reduction of our problems with
aging aircraft, but we need to also make sure that we are
pushing not all of them but we are pushing the top couple
of them.
We are facilitating the transition, making sure that we
are prototyping them on the aircraft. We do not fly what
we have not tested; and I can show you story after story
after story when it approached that test, something else
happened, either we had a sealant or we had a compound,
or wash cleaning fluid that interacted and we need to be
able to evaluate those issues.
Of course, we're promoting knowledge management. What is
the cost of aging? Where is that big idiot light that says:
"Buy more F-18EFs and retire S-3s for tankers"? Where is
that point that we can make the right economical decision?
And there's a paucity of data on those issues and it's kind
of like everybody has their own way of calculating it and
we're working with Rand, trying to get all those groups
together.
So we're working together on a variety of issues from process
to technology to acquisition to knowledge management type
of solutions.
Next slide, please. That's what I do on my part-time job.
We've been tasked by the Aeronautical Commanders Group to
try to foster a national strategy, working DOD, NASA, FAA,
and industry. What do we need to do? A lot of our effort,
about 80 percent of our time, is on what I call tactical
initiatives, what is the best way of inspecting wire, what
is the best corrosion compound, yada, yada. About 15 percent
of our time or more is strategic areas. What do we need
to do to handle diminishing manufacturing sources and obsolescence?
About 5 percent of our time is on things like what is the
right amount of sustaining engineering that we need to have
on our team. How much emphasis do we need to have on our
data systems? What data do we need to collect?
We just recently partnered with NDIA and AIA, two industry
consortiums, so that we can get feedback from industry,
because I'm not going to say that I'm clairvoyant and have
all the answers. I've made enough mistakes, I have nine
lives based on my mistakes, but I want to get from industry
that partnership of where do they think we need to change.
Do we need to change our process for buying, for supporting?
What amount of balance is there in the government and industry
team?
Next slide. You purposely can't read this. I don't want
anybody to read this because it's an early version. But
we've actually gone to doing road maps where we've surveyed
-- and this is from wiring -- from both a technology point
of view, an acquisition, a logistics, a training, all those
areas, all the different programs that are out there. When
you see those pretty little red things, well, green is good,
yellow is ehhh, and red is real bad. You see where we need
to build a strategy, and we're trying to make sure that
all of our funding and resources, they're not joint but
they're at least lined up and all pointing in the same direction
and we're pulling in the same way.
Next slide. What are some of the successful models of teams
that we've stood up. Too often we have a hearing like this
and we go in there and Congress passes a new law and we
anoint a new person to be the czar of something and he comes
out, or she, and puts out lots of mandates. And maybe I'm
a cynic -- well, I know I'm a cynic -- maybe I'll admit
it -- but that doesn't always work.
One example I want to point out is what we did with the
JCAA corrosion steering group. The reason it was successful
is we took the materials experts in each of the sites and
married them up with the program teams, put in logistics
people for publications and training, a cross-functional
IDT, and said, "You guys tell us what to do." My role now
becomes less of a messenger and more of a barrier-removal
expert. At least that's what I call myself. They call me
something, other things, but we can't say those in public.
So we need to build those from the bottoms up and not just
create something from the top down that puts more unfunded
mandates on us.
Next slide, please. Summary. I think our aging aircraft
problem's a serious threat. I think it's something that
requires an infusion of resources, an infusion of capital,
and a national strategy to be done. At the Joint Council
on Aging Aircraft, we're trying to coordinate those different
areas. You can come back and judge whether we're successful
or not. I think the industry cooperation is critical. We're
not going to say that this is a government-only issue, but
we're listening from the best practices. I will steal from
anybody and any group and, as Winston Churchill said, he
would even say a kind word for the devil in the House of
Commons if he would help him against the Nazis. I'll even
partner with the devil if he'll help us with our aging aircraft
strategies, and I think we need a strategic process that
requires that collaboration. And the last time I checked,
we need NASA's involvement in there. Their involvement's
increasing, but we need to remind NASA that one of those
A's stands for aeronautics and we need them and their expertise.
ADM. GEHMAN: Thank you very much.
MR. WALLACE: I think the focus has been mostly on
structures, although Mr. Ernst did talk about avionics and
wiring. I know that in the civil sector where I came from,
after Aloha we launched, of course, a very extensive aging
airplane program. I feel like the structural part, at least
perhaps in the less challenging field of civil aircraft
operations, is reasonably well handled or at least that
we currently feel that the aging systems challenge is greater
-- and wiring in particular.
I wondered if you have any sort of conceptual thoughts on
aging systems, wiring, and whether or not there's a different
approach. You talked about the need for accurate reporting
and that sort of thing. But in many respects those seem
to be some of the more difficult challenges.
MR. ERNST: You could pick any subsystem that you
want and the process that was set in place -- from analysis,
technology, investments, prototyping, data collection --
that Dr. Gebman showed, needs to be followed through. And
I believe that the FAA's wiring non-structural program follows
some of those classic issues. In having been part of it
and actually teaming with the FAA on some of those areas
in wiring, you can see that it follows the same type of
elements in there.
Wiring is a major issue. We made some mistakes when we selected
the wire types in some of our vehicles in the Eighties.
We did some qualification testing on it, and it had some
very adverse characteristics. I'm trying to be nice. We
now need to make sure that we're developing things that
are not just saying, yeah, throw that one away, build all
new aircraft, but can inspect it to make sure the bad characteristics,
i.e., the tracking that was associated with aromatic polyimide
insulation is not prevalent. But all those elements require
smart people work together and the success story is -- I'm
not sure you aware of this, but the FAA has spent a fair
amount of money really investigating the different types
of inspection technologies, whether it be frequency domain
reflectometry, time domain reflectometry, scanning wave
ratio, and a whole bunch of things that make my brain hurt.
And the Navy is actually doing some of the transition and
manufacturing of those systems and buying and fielding them
initially in our depots and our organization of the troops.
The Air Force is doing the same thing. We're working together
on these issues and eventually we're going to get products
that the commercial industry can take back in on. So you
see the FAA do the early R&D, the Navy and the Air Force
do some of the tech transition of prototyping and measuring
and quantifying what percentage of wire chafing is now degraded
that you have to replace -- what are those red, yellow,
green thresholds -- and then the commercial aircraft industry
can pick up and procure those things without having to develop
all those issues. The process is pretty much the same, but
we need to make sure we have a robust in all those issues.
Wiring is in pretty good shape. Corrosion in structures
is in pretty good shape. If you want to talk about helicopters
and all those rotating machinery, it's a pocket of poverty.
MR. WALLACE: Well, following up on one of your points
about the type of detailed inspections required, I mean,
can you speak to the issue which I know was very much discussed
sort of in the post-Aloha inspection implementations of
just sort of numbingly monotonous maintenance tasks and
the human factors associated with that?
MR. ERNST: I like the choice of words. One thing
that when I got a chance to sit inside or look at the internal
bay, cargo bay of the Columbia in '99 was at Palmdale and
there were wiring issues, the primary method of inspection
of wiring was the Mark 1 motto, eyeball in a mirror. And
I sat there with the Air Force wire technologist on a team,
George Zelinski, a very detailed, knowledgeable individual,
and I tried to see if I could find those myself because
I'm an engineer. I've been around wiring enough times. I
couldn't see those issues that they were required to pick
up. And we had a system then that was mind-numbing, that
required a lot of expertise and experience and there's technology
out there that can do that better and, more importantly,
can do that as a precursor to failure. You don't have to
wait until you see insulation to say, yes, it's through.
What we need to get to is a prognostic system where we can
check non-intrusively, not pulling bundles apart, but we
can check those wiring bundles and say I'm starting to get
some breakdown whether it be due to hydrolysis, whether
it be to chafe, vibration, wear, gremlins, whatever, and
say now I've got 80 percent through. At 20 percent I now
ought to go on a scheduled maintenance procedure and put
that together. And that's where we need to go and that's
part of a holistic wiring strategy that I believe we have
right now. We just have to get it funded and implemented.
MR. HUBBARD: I have a question for Mr. Ernst. You
made a passing reference to NASA's PRACA problem-reporting
system. Could you characterize for us what you think are
the best characteristics of the kind of accurate problem-reporting
system you referred to in your slides?
MR. ERNST: A system has to be realtime. It cannot
be a system that takes 18 months to collect data. It's got
to be something that is easy for the operator or maintainer
to input. The Navy system, years ago, was a paper system
where the poor guy, after working a lot of hours fixing
the aircraft, would fill out the paperwork and, because
of that, there were inaccuracies once in a while -- not
in Admiral Turcotte's squadron, of course -- but there were
inaccuracies that every once in a while we went back and
looked at those things.
ADM. TURCOTTE: We go back.
MR. WALLACE: Are you trying to sell him something?
MR. ERNST: I could tell stories, but I won't.
It has to be a system that is easy, simple, robust, and
it has to be something that tells you something about the
failure, not bug-in-the-cockpit type of issue and then say,
"I removed the bug." You need to go in there and say, "Hey,
I had some failure issue," and it needs to tie back in from
the operator what his perception of the failure, because
he's going to describe it, "Hey, this didn't work." He's
not going to say that you had a corrosion on Pin 5 of your
connector which stopped your data flow. That's going to
be the engineer, and it has to tie those systems together
with some software that can easily do some trend analysis.
And another point we have to do is we have to keep the data
long enough to do trend analysis. And there has been a push
to throw systems and data away after 18 months and we need
to go back five or six or seven or eight or ten years to
get a statistical sample size. So those are some of the
characteristics, and we're working to get some of those
systems implemented now.
MR. HUBBARD: On the report that my predecessor Harry
McDonald did, one of the shortcomings that he found was
that the PRACA system did not appear to have all of these
characteristics you just mentioned.
MR. ERNST: Harry called it the data morgue.
MR. HUBBARD: Data morgue. Yes. One of the things
that you commented on just a few minutes ago was getting
the material to the subject matter experts at their fingertips.
Can you expand on that a little bit?
MR. ERNST: Sure. Let's switch to an avionics box
failure. We need to not only have it so that a data expert
who knows the system can write trend reports but the information
if we get a failure back, let's say, on an INS system that
failed, that individual who's cognizant of that system needs
to go in there and say, "Have I had other failures on this
system? Can I find some trending? Is it just recently or
periodic? Can I go in and find out if memory chips or whatever
type of chip is failing in other systems?" He needs to be
able to do that research, that forensic science at his computer
terminal and a lot of times our data systems will give us
great reports on how many maintenance manhours we spent,
three months late. And when we get a mishap in, when we
get a box that's been failed, we need to understand and
have that information right there at our fingertips.
MR. HUBBARD: It would be as if you only got a report
on your checking account every three or four months.
MR. ERNST: Yes, sir.
MR. HUBBARD: Thank you.
ADM. GEHMAN: Mr. Gebman, in one of your viewgraphs
that you presented on the heavy maintenance work days per
depot for KC-135s and also in the heavy maintenance workload
ratio which showed how much depot-level maintenance is required,
how it's grown over the years, in your experience -- and
I'll ask both of you this -- is that an accurate indicator
that there's something else working below the system that
you need to go look at? Just keeping track of how much depot-level
maintenance is required and how it's growing, how does that
relate to characterization of aging?
DR. GEBMAN: Excellent question.
ADM. GEHMAN: Or is it just interesting?
DR. GEBMAN: Excellent question. We have studied now
all of the Air Force's fleets and have compiled the statistics
for, in particular, the labor hour growth over time; and
it seems that once you get beyond 15 years, you're almost
certainly facing a future of climbing work to be done --
some fleets that will start a bit sooner, the fighters tend
to start sooner, their lives being somewhat shorter than
the larger aircraft. It just seems to be a feature of aging.
It might well be somewhat analogous to people. In the older
years, we find ourselves going to the doctors somewhat more
often than in our teenage years.
So if you want to have a sense of the age of a fleet, one
measure that you might look at is, well, how is the maintenance
workload changing over time. And when you see that steep
part of the curve, like the presidential transport, the
old 707 known as the BC-137 in Air Force nomenclature, that
one literally exploded over a couple-year period and those
airplanes are no longer with us.
So it's certainly something to watch. We've tried regression
analysis, various statistical methods to try to correlate
the rate of rise, the characteristics of fleets. We're making
some progress in that area, but this is an area where there's
a lot that's not known.
MR. ERNST: You want to mention the cost-of-aging
study?
ADM. GEHMAN: Go ahead.
MR. ERNST: One of the issues is I had seen the Rand
data almost when I started in the aging aircraft program
about four years ago and we've shared back and forth and
just recently the Joint Aeronautical Commanders Group Aviation
Logistics Board has kicked off an effort that we're part
of to look at what are these factors, can we translate the
KC-135 experience to other Navy aircraft and other Air Force
and Army helos and try to understand what are those factors
so we can get a better understanding of what's causing it
and what the trend lines are. Just having information that
says my cost is going up is not sufficient to be able to
correct the problem. You need to then drill down and say,
okay, but why. You know, I think on the KC-135 they have
a pretty good idea of that. But that's what you need to
do is not just look and say, yes, it's going up by 7 percent
but you need to understand why is it going up 7 percent
and what can you do to try to mitigate that curve.
ADM. GEHMAN: So my understanding is that, unlike
the Dow Jones Industrial Average, the fact that older aircraft
require more maintenance is not remarkable in and of itself
and is not an indicator that anything's breaking or anything's
going wrong. You've got to have much, much better indices
at the system, subsystem component level in order to determine
it.
MR. ERNST: And it's not just age. I'll give you an
example. We were talking this cost of aging. I don't remember
the numbers off the top of my head but one of the folks
at Tinker said it's costing them X number of hours to paint
a KC-135 now and it cost them a lot less ten years ago.
And they said we're not adding one more ounce of paint.
The problem is that you've had different changes in environmental
regulations over those years, and you've got to make sure
you're accounting for things properly. I mean, those environmental
regulations aren't bad, but we've decided that this hurts
Bambi and Flipper and those sorts of things and we want
to take them off and it requires different steps and you've
got to factor that in there. A lot of the cost growth you're
seeing is due to things that are not age, either environmental
or fleet usage. Yes, they're going to go up, but they may
go up in a certain time to a manageable point and then where
that curve breaks, that's what we have to figure out.
DR. GEBMAN: I'd just like to basically add that Bob
is absolutely right. You need to look at the underlying
mechanism. If the workload is climbing because you now have
to tend more to corrosion and you're satisfied that you're
able to see the corrosion and tend to it, that's manageable.
In the area of fatigue cracking, you have to be a little
bit more careful. Rising workload may indicate that you're
getting more and more cracks closer and closer together,
and one of the very important assumptions that we make in
managing fatigue cracks is that the neighborhood is healthy.
So as the population density of cracks starts to get too
high, you run into a situation where you might have thought
you were fail-safe but, in point of fact, the neighboring
structure can't carry the load.
DR. WIDNALL: I'm sort of sensitive to this issue
of aging aircraft because I worked on the B-52G when I was
a freshman and I worked on the KC-135 when I was a sophomore.
So my friends are still out there.
What I want to talk about is composite materials. I was
a little sorry that you sort of excluded that from your
chart, but I'd like to get a sense from you about some of
the challenges associated with these composite materials.
How well do we really understand their fatigue properties?
Do we really understand their properties as well as we understand
metals? What about their exposure to UV radiation and high
temperatures and corrosive chemicals and all those sorts
of issues? And I know we're using these more and more in
our aircraft fleets in general and in particular on the
shuttle. They're obviously a key part of it. And it's just
not composite materials but other kinds of brittle materials,
sort of what I would call nonstandard terms.
DR. GEBMAN: Thank you for asking this question because
when I was thinking about what to talk about today, I really
struggled with do I talk about the areas where we have depth
of knowledge that might be useful to your investigation
or do I talk equally across the areas even though the depth
of knowledge is shallow. Clearly, with metals there's a
lot that we know, especially on fatigue, and we're learning
rapidly in corrosion.
In the area of composites, I think that Charlie Harris from
NASA Langley at the conference earlier this month of the
AIAA, American Institute of Aeronautics and Astronautics,
this big gathering, 780 people, 525 papers, Charlie gave
a talk about the progress in composites and he was very
positive and upbeat about all the good technical work going
on. And that was all appropriate. But then he shared with
the group a round robin exercise where they sent problems
around to people, the same problem to work on, and people
came back with different answers. And then they did another
exercise where they even told people what the problem was
and they still came back with different answers in terms
of the methods and the assessments.
So the whole area of composite materials is one that might
be analogous to where we were with metals back in the 1950s.
Back in the 1950s, we had the alloy-of-the-month club; and
that's where the B-52 and the KC-135 came from. The young
engineers were finding out better ways to do the chemistry
to get strength, but they didn't have time to understand
the durability, the fatigue properties, and the corrosion
properties. I'm somewhat sensing that, with composites,
we're still inventing cleverer ways to get strength but
we don't yet understand the long-term durable characteristics.
The science is far more complicated because with metals
it's homogeneous, it's the same material, with composites
you've got fibers and glues or resins and it's a very complex
interaction to try to model and we're not good at it yet.
So anything that is made of composite requires even more
circumspection and attention than probably the metals.
DR. WIDNALL: I was afraid of that.
GEN. BARRY: Excellent presentations by both of you
and raises a lot of questions. As you know, the board has
taken a very serious approach to aging spacecraft in this
what we call R&D development test, however you want to call
it, environment.
A couple of comments. Your references to the Air Force,
as obviously I'm familiar with, where we are older than
we've ever been before. We've never been in this era in
the United States Air Force -- as is the Navy. We're approaching
ages where the average age of our platforms of 6,000 is
22 years old. So even within the data experience base that
we have, flying airplanes, we're approaching new environments.
Now, let's translate that over to spacecraft. We are entering
a new era in spacecraft, with reusable vehicles in an environment
of aging. We've never been there before. So we've got two
parallel efforts going on that certainly can kind of cooperate
and graduate, as we've seen evidenced by the Navy and the
Air Force here.
I've got a couple of quick questions and then a rather larger
one. First question is: Is NASA involved in any of this
Joint Council on Aircraft Aging, as far as you know?
MR. ERNST: Yes, sir. NASA has been involved in the
aging aircraft effort since Aloha, prior to me being in
it. The efforts at Langley in structures and corrosion NDI
have been solid. Just recently, Christmas time frame, before
Columbia, they said, hey, we recognize we need to help you
in that national strategy; and they're getting more involved.
We need even more. I need to fill in gaps.
GEN. BARRY: On your side as well as the space side?
I mean, are they translating lessons learned to both aero
and space?
MR. ERNST: Yes. I'm not going to tell you it's even
and homogeneous throughout, but I know that in wiring, the
shuttle folks at Kennedy are in lock step with my guys and
the FAA and I know the aerospace side and structures are
working real well together. We're trying to see where the
gaps are and plug them in there. We need more involvement,
but they have been involved.
GEN. BARRY: All right. Let me ask this. Two things.
Let's just talk about corrosion and let's talk about fatigue
cracking that, Jean, you mentioned earlier. Right now we
have capabilities within our aircraft to do stress-testing
that you mentioned as an example. We have programs that
are not only based in the United States -- Australia has
an excellent one on how do this. I think we all recognize
that who are in the industry. What can we do insofar as
spacecraft are concerned because obviously they are larger
and we translate that to our larger aircraft insofar as
dynamic testing is concerned, because I don't think it's
unfair to say that managing aging spacecraft in NASA, for
the large part, is done by inspection. So how do we translate
that, what we've learned in aircraft, over into NASA as
a possible recommendation?
MR. ERNST: I think you need to break it down into
the subsystem component areas. For example, we had this
discussion on the McDonald's team three years ago now, on
the SIAT team on wiring, where we had totally different
environments but we could take the Air Force and Navy's
experience with aromatic polyimide insulation and say here's
what we saw under these load conditions. Now, under a probably
higher vibration, higher thermal but shorter duration environment,
how is that going to translate? We know how that fatigue,
so to speak, environment can translate and run a new model
to see what it should do with the shuttle program.
That's the kind of transformation that could be done, but
only if you knew how each of those subsystems and the materials
of those subsystems is going to behave as a function of
time and age and environment over a number of years. The
problem is a lot of times we don't know that information.
So we know how it works here, we know the loads are different,
but we don't know how the age is going to translate as those
factors are translated, if that makes sense to you. I don't
think it's hard to do that, but you have to invest in some
age-related studies and that's not necessarily the top of
the list.
GEN. BARRY: One of the concerns we have is to be
able to analyze how the orbiters have been shaken, rattled,
and rolled over these many years, especially when we take
into consideration that this was a spacecraft that was designed
to be flown 100 times in ten years ago and now we're multiple
years, decades past that and we are still only at the 20s
and 30s. A question then is, you know, how do we maybe translate
some of the lessons learned on how the spacecraft are flown
within spec but, you know, after a while, get some kind
of stress loads on them that can be accumulated over time
and measured. Now, translate, if you could, the lessons
learned that we've developed on aircraft that might be able
to be translated over to NASA.
DR. GEBMAN: Could I have Chart No. 24, please.
MR. ERNST: You guys are going to learn this chart,
because he wanted to show this to you.
DR. WIDNALL: He's ready for you.
DR. GEBMAN: This is a really tough question. Obviously,
with the shuttle we don't have the luxury of a full-scale
fatigue test. Obviously doing a tear-down, if this was an
aircraft fleet and when we had hundreds or even tens, we'd
consider taking the oldest one and tear it apart and see
what ails it and then use that to guide future work. When
you're down to three, that's not an option.
So then you ask yourself, well, what might we do? And when
you look at this diagram, on the top row, the matter of
force tracking data and loads analysis, there may be some
things you could do in terms of assuring that NASA has developed
all of the effort that it can, evaluating the strain gauge
recordings and pressure recordings from prior flights, and
that you really have as excellent a record, historically,
of the loads that have been imposed on the structure as
you can possibly get each.
The next thing you then could consider doing is, given the
best loads data, to go back and, using more current finite
element analysis methods which have improved greatly over
the decades, to go in and do some spot checks on your stress
computations to make sure that you've got the best that
we can do in terms of estimating stresses from the given
loads and then take it the next step and go in for the fatigue
part to check on the crack growth calculations, the fracture
toughness issues, and to make sure that the engineering
community has really been resourced and tasked to do everything
that we can to understand the health analytically of the
fleet.
Then the final thing you might consider doing. From the
debris that you do have, in effect, you have already a partial
tear-down circumstance and to go in there at some point
and literally take apart that which is still connected together
and really check for like adhesion on honeycomb, how is
that, that waffle still adhering to the face plates, and
just get as much mileage as you can out of your debris in
terms of understanding what the health of the remaining
fleet may be.
MR. ERNST: Slice up your poles, your joints, rivet
holes, things like that. That's what we do routinely.
To follow on with the chart that Dr. Gebman put up, you'll
notice a couple of things. One, do a mid-life assessment
of the loads. You know, the Columbia originally was kind
of a flight-test bird and I believe had some several hundred
pounds of instrumentation and sensors in there to measure
fleet loads. To give you an example, the P-3 and S-3 program
just recently completed mid-life fatigue testing at Lockheed,
and we found drastic changes to both loads from what they
were anticipating. The maneuvers were a little different.
The theoretical issues, the early introduction issues slowly
change over time. You know, it's like boiling a pot of water.
It doesn't boil all at once. And I think you need to go
back and really do those load surveys.
You also need to do some type of tear-down. You can't cut
up, you know, the Atlantis and make it a series of razor
blades and fractographic analysis and stuff; but the Columbia,
when they had wiring problems in '99, NASA did go and remove
certain wire segments. You can go in without cutting the
whole thing up and remove certain panels, remove tiles to
see adhesion, remove subsystems. When a part's going through
an overhaul, take this part on overhaul and do those types
of things. So there are things that you can do; but again,
you've got to have a proper program to get that environment
and see how we're doing.
The S-3 example in fatigue tests, we had 12 points that
we considered life-limiting on the aircraft. Four of those
they knew in the original fatigue tests and the odds were
out of there. We found an additional eight points that were
due to the loads, and due to the tear-downs we saw microscopic
cracks. We were able to go in and cold-work fastener holes
in that aircraft and give it fatigue life back. Real simple
operation, real cheap, and not have the 305-inch wing cracks
we had in the P-3. So you're able to do some of those things
if you invest in the time and the resource and have a robust
program.
DR. GEBMAN: If I might, I'd just like to follow up.
Could I have Chart No. 7, please. There's an important aspect
that I neglected in my answer, and that is that we're dealing
with a spacecraft. And I apologize. Obviously with something
like the shuttle, you have thermodynamics acting as well
as structural dynamics; and in addition to getting a solid
characterization of the historical loads, you also want
to get a solid characterization of the historical thermodynamic
exposure because -- take a spar cap, any one of those four
spar caps that are identified with the arrows. If, in the
course of the history of a particular spar cap, it has been
exposed to temperatures different than the other spar caps,
then the loads in that part of the structure are going to
be different by virtue of the thermal expansion of the material.
So this is a very complex thermal as well as structural
dynamic circumstance.
ADM. GEHMAN: Let me follow up on that before I call
on another board member. Do I understand that you are suggesting
that it's useful in the study of aging aircraft to establish
some measurements of what I would call stress cycles or
something like that? We understand age. We understand landings
and takeoffs. But there are other events which cyclically
stress the aircraft, particularly in the case of the shuttle.
And it's useful to keep track of those, in addition to the
obvious ones like landings and takeoffs and how many months,
hours and all those kind of obvious things.
DR. GEBMAN: These things with aircraft are tracked
routinely. Exceedance curves are developed which are a statistical
way of representing even the small variations. My most recent
comment suggests that we should also construct a thermal
exceedance spectrum, as best we can from the historical
data, so that to the extent that we've got differential
thermal expansion of structure going on, we can factor that
into the loads that the members receive.
You see, there's two load levels. One is the aerodynamic
load and the inertial loads applied to the gross structure.
The other issue of load is, for a particular structure member,
what load does it see over its lifetime; and that can be
driven by thermal expansion issues, just as it can be driven
by the aerodynamics. And given the historical records of
the temperatures, the engineers should be able to construct
and may already have done thermal exceedance curves to go
along with load exceedance curves.
MR. ERNST: I think you need to look at every environmental
factor and see if is there a similar type of a correlation
in there. We've done a good job of fatigue tracking. We're
tracking a lot more parts than we used to. The models are
a hundred times more detailed than they used to be. We can
calculate things a lot finer, but I think you need to be
able to look at all the different loads in environments
that any vehicle goes on and say, okay, what's changing,
what's the effect of that over time.
ADM. TURCOTTE: For both. Kind of the 3Cs in aging
-- you know, Kapton, Koropon, and corrosion -- which go
back a long time in finding problems with Kapton wiring,
with Koropon bonding, de-bonding, heat translation, all
of those things. That's Part 1 of the question. Could you,
kind of both of you, talk a little bit about major lessons
learned from both fleet usage, commercial usage, and your
knowledge to the extent of findings on the shuttle, both,
you know, galvanic or intergranular types of corrosion.
Part 2 question. If you were king for a day with your knowledge
of the PRACA data base, what would you do to improve it?
MR. ERNST: You're going to get me in trouble. I was
very nonpolitically correct about the PRACA data base in
1999. And I have not seen it since then but I think if you
go back and you read the Shuttle Independent Assessment
Team report, you will find that the comments of the group
were less than favorable on PRACA. I'm not saying that the
Navy and the Air Force and the Army's data systems are perfect,
but we're taking steps in the right direction. So I really
can't comment on what they're doing today. I know they made
some improvements, but it was pretty abysmal back in 1999
and, I think, masked some of the issues that feed into your
risk equation that we saw back then. I think that was a
mistake.
As far as handling some of the materials and some of the
issues with Kapton, aromatic polyimide insulation manufactured
under the Dupont trade name Kapton -- get that correct --
we didn't do a good job on establishing realistic life cycle
testing for that material when it was introduced. Kapton
has a lot of good properties. I don't believe I said that,
but it has a lot of good properties. It's very, very tough.
It has some very adverse characteristics that we never tested
for. But I think you can go through several other tests
and I know there's been arguments with the FAA on the flammability
tests, whether that's applicable, and there's lots of different
tests and we didn't do a good job of running a qualification
test and an aging test that's run on a short period of time
that's trying to cover 20 or 40 years. So we made some mistakes
on that.
The other issue is once we had problems with the wiring
insulation, I don't think we developed realistic scenarios.
If you look at the cost of replacing and rewiring a whole
aircraft, it's several million dollars. Well, do I really
need to do it? Do I need to do it in all areas? Which platforms
do I need to do first? And what we have done now is develop
a bouquet of options. Whatever color of flowers you want
and whatever kind of room, it goes together. Because what
my wiring options on the F-14 Tomcats, which are going to
be retired in the next four or five years, is totally different
than what I would do on earlier production F-18s or P-3s
that are going to be around a little longer. So you have
to develop options based on risk so that you can do things
quickly, cheaply, easily, and get it done and not just give
one option is all.
So I think two issues. One, we didn't do a good qualification
testing and we need to continue, just like the life cycle
testing, just like the fatigue tracking where you update
it and you get better; and the second issue is we didn't
develop any options.
DR. GEBMAN: On the matter of wiring, the Air Force
in the case of the KC-135 embarked on a major rewiring program
about five years ago; and that is going to probably continue
for the next four to five years, at which point they will
have substantially replaced the wiring on the 135. The basis
for this was an accumulation of maintenance action that
was becoming increasingly costly to exercise and a concern
for flight safety, and those two factors together seemed
to have driven the train on that fleet.
Unfortunately, our ability to predict life, we don't have
the engineering tools that we have with fatigue cracks,
either with composites yet or, for sure, with wiring, which
makes those areas very difficult to feel comfortable about
with an aging fleet.
ADM. GEHMAN: How comfortable would you feel with
the study of the aging characteristics of a main engine
that's fueled by liquid hydrogen and burns a thousand gallons
a second and produces a million pounds of thrust? How's
our data base on how that baby ages?
DR. GEBMAN: Well, on my chart I did include a line
that said propulsion; and it didn't get extremely high grades
for data or methods or people that really understand life
issues in that area. So you've hit another excellent nail
squarely on the head. For those areas, going back to General
Casey's comments about understanding margins and managing
to margins, you really have to worry that as time goes by,
you're eating into those design margins and at some point
the ice become thinner than what you're comfortable with.
And that's a technical judgment probably more than an engineering
calculation.
MR. ERNST: Follow up. One of the successful programs
that the Air Force and the Navy has is on aircraft engines.
And they've realized that you've got a lot of moving parts,
a lot of high temperatures, a lot of complex interactions
in there. And they have what they call CIP, Component Improvement
Program, where they go back in and they test and they see
where their problem areas are and they incrementally try
to infuse newer technologies and fixes in the early parts
of the service.
Again, that's one of those that's always fighting to try
to get resources adequately in there, but if we follow what
the commercial industry does, we can really improve the
reliability and we can have a pretty good idea and almost
get to a scheduled maintenance type of inspection so we're
not flying and say, yeah, lost an engine or had a shutdown
but, okay, now at 7, 8 hundred hours I have an 8,000-hour
interval period and know exactly what to replace. So that's
another example where we've taken the methodology that Dr.
Gebman talked on structures and we've transferred over to
the engines, and I think both the commercial and the military
have very good experience in that being successful.
DR. GEBMAN: I certainly wouldn't quarrel with my
distinguished colleague, but I would hasten to add that
the commercial engine and even the military engine circumstance
with aircraft is far different than the circumstance we're
talking about here.
DR. LOGSDON: This is all very far away from the experience
of a Washington policy wonk. So excuse me if these are really
naive questions. What does the fleet size of three do to
the ability to do the sorts of things that you think ought
to be done?
And the second question, I think it's really for Mr. Ernst,
coming out of his independent assessment experience. Is
NASA routinely collecting the kind of data that would feed
into the kinds of trend analyses? You know, outside of faults,
PRACA and that, is there a data base that you could apply
some of these methodologies to?
MR. ERNST: Well, I think all the agencies and commercial
are collecting a fair amount of data.
DR. LOGSDON: On shuttle.
MR. ERNST: On shuttle? I mean, you look at the Navy
programs and Air Force programs. We're collecting 80 percent
of what we need. I still think we need to do more the cause
of failures.
For example, if I went into the Navy's data base on wiring
chafing, there is no failure code for chafe right now. What's
the primary failure mode for wiring? We're fixing that,
by the way. So I can say that, but that's one of the issues.
I mean, we're not recording the right type of information
in all cases. We're about 80 percent there.
My beef with PRACA at the time was you couldn't go in there
easily and extract anything to make decisions. I at least
can go into some of the services' data bases and pull some
information and get a pretty good idea and then at some
point I have to play archeologist or forensic scientist
and go back through and do some more. But it works out to
that 80 percent. There need to be some other changes; and,
unfortunately, data is the one thing that everybody wants
to cut in the budget crunch. We don't want to pay for that
data.
DR. LOGSDON: If I understand PRACA correctly, you
have to have a problem or perceive a problem to even get
in the system. I'm saying is the shuttle even instrumented
to capture the kind of data that you would like to have
to measure various elements of its aging.
MR. ERNST: Not in all cases, but I think you can
probably do some work-arounds with that and be able to check
things. I mean, you don't have to do everything in flight.
You can do engine warmup cycle times and check temperature-wise
in there, check component issues, and test things. Things
like that. You can capture that information if you need
to.
You need the maintenance-reporting information, which PRACA
primarily did. You need to trend analysis like if I get
to this certain load level, this is going to impact my fatigue
life. And then you need to be able to do periodic instrumentation
at times. And it doesn't always mean a full-scale in-flight
test. It means capturing some of the data. And that data
was available. Could you get that. Was it easily, readily
available? No, it wasn't readily available.
DR. GEBMAN: Putting my engineering hat on relative
to your data question, given that the instrumentation and
wiring in the shuttle and the systems were designed in an
earlier era in terms of electronics, it might well be worthwhile
rethinking the matter of what are we interested in observing
during future flights in order that be might create a more
complete record of environment and loads so that we can
better manage the remaining lives of the fleet.
MR. ERNST: Health management, health monitoring for
the system.
DR. GEBMAN: And regarding your observation of the
number three, what does it mean to have three in a fleet?
From an operational perspective, one of the early lessons
I learned at Rand was that whenever you visit a unit, you
always expect -- and Admiral Turcotte will appreciate this
-- you always expect at least the Nth airplane to be a source
of supply for the others, if you're lucky. Sometimes it's
more than just the Nth airplane. So if you have a fleet
of three, from an operational perspective, one of the three
is needed to support the operation of the remaining two.
And to have an operating fleet with just two means that
you only have one backup and that's very thin.
MR. ERNST: And I think it makes your correlation.
A lot of times when you have how many hundred F-15s and
F-16s, you can start looking at the gross number of failures
and say I need to look at something. When you have three,
you can't rely on that. You have to take a little bit different
systems approach to be able to capture your data.
The Navy flies some type model series, you know, that are
12. Twelve EP-3s. And each one of them is a slightly different
configuration. But you can capture that information. It
just requires a little different approach, and sometimes
it's not as robust, predictive, leading edge because you
don't have that significant sample size.
MR. WALLACE: Were you suggesting, Dr. Gebman, that
sort of the fleet leader concept; or were you suggesting
cannibalizing parts? I wasn't entirely clear.
DR. GEBMAN: No matter how good your supply system
is in terms of providing parts, you always end up in a circumstance
where you have a first-time demand for a part and the last
airplane of the unit then becomes the offer of that replacement
part. I think that if you talk to the NASA folks regarding
the matter that's referred to commonly as cannibalization,
it's borrowing a part from one aircraft or spacecraft in
order to be able to launch one that's scheduled to go.
MR. WALLACE: Another question. This is jumping subjects
a bit. Should the goal of an aging aircraft program grow
beyond maintaining the aircraft to be as good as new? What
I mean by that is: Should it meld in with sort of obsolescence
issues, issues where the technology has simply gotten to
be so far behind the state of the art that it either makes
sense for economic or safety reasons to upgrade or even
reasons of simply maintainability?
DR. GEBMAN: You're raising the issue of replacement,
fleet replacement; and we have struggled at Rand with the
Air Force long and hard on that matter because, for example,
the tanker fleet. It's a very important fleet. Without the
tankers, the Air Force doesn't go places. They don't have
aircraft carriers to carry their airplanes. So they're very
dependent upon their tankers; and to have almost all of
your tanker fleet wrapped up in one type of aircraft that's
40 years old now and to be planning to do so for another
40 really raises questions.
The first thing we looked at, well, is there a case on economic
grounds for replacing the fleet. There was an economic service
life study done and it shows rising costs, but it doesn't
show the rising cost by themselves being a sufficient basis
for justifying a new fleet, whereupon then you start asking
questions along the line of obsolescence issues, foregone
capability improvements that you can't have without substantial
investment in an aging fleet. So this whole question about
when is it wise to replace a fleet is one for which we still
don't have a good methodology for dealing with.
MR. WALLACE: I really didn't intend to ask that question
about replacement. Well, it was a good answer. But about
replacing the fleet as opposed to simply upgrading, particularly,
I mean, fleet replacement, you know, lots of smart bean-counters
with spreadsheets do that for the civil aircraft industry
but I think there's a whole set of different issues with
next-generation spacecraft. My question really is more about
upgrades.
MR. ERNST: To address that -- and you picked on obsolescence.
When you get to the microcircuit obsolescence issue, which
has become a science fair, pet rock project of mine over
the last 10 or 12 years, there are lots of different options
and right now we are system-incentivized to find this chip
to put in this box in a lot of cases. We found about a third
of the time that doesn't make sense because not only is
that part obsolete but the three around it are terminal
and the whole board's wearing out because we keep replacing
it so many times because of poor reliability. So it's probably
better at that time to take the whole thing, take the cards
out, and make it a lobster trap somewhere and then put a
new system in there. That really happens about a third of
the time. But we need to again, I think, balance some of
the different pots and stovepipes of money that are available,
especially in DOD, to be able to optimize those issues and
have the best understanding of the age effects, where they're
going to be two years from now, because I may make a replacement
today and I've got three more downstream. I need to look
where I'm going to be three years from now and say this
is time to replace this 1988 Tercel that I had with 189,000
miles and go buy something new because this is just the
tip of the iceberg. And I don't think we're doing a real
good job of that but's one of the challenges of not just
maintaining status quo but looking and saying what capabilities,
what mission growth areas, where am I going in some reliability
issues and balance all of those into like a triangle of
a decision matrix.
DR. GEBMAN: There's a fleet that we're looking at
now that has the potential for receiving an upgrade to its
aviation electronics to give it capabilities to continue
its military relevance. And there are also a series of mods
being considered to upgrade the engine so that its flight
safety features remain appropriate. And similarly with the
air frame. And as we're going through the arithmetic on
this particular fleet, one of the things that we're seeing
is that by the time you're done making whichever of the
three mods or all three of them to the fleet, the years
remaining becomes very significant to your choice. And when
you go to the operator and you ask the operator, well, how
long do you want to retain this fleet, well, they're really
not sure. So this question is almost as difficult as the
fleet replacement question.
MR. ERNST: And you look at the mission changes in
the Department of Defense in the last couple of years where
we've gone from a Cold War scenario to more of a small conflict
and now global war on terrorism and it changes. We have
planes that, to pick on Admiral Turcotte's S-3, that were
designed to hunt subs that were doing surveillance and tanking
and dropping weapons and doing, you know, partridge in a
pear tree and everything else. And you need to look at those
mission changes as a function of age too and say, you know,
I may be able to keep this aircraft doing what it did five
years ago but you know I need to replace it. I need to go
over here. And we don't always balance all those issues.
I know the Air Force is really trying to look at that decision
and set up a fleet viability board to weigh the aging factors
in these mission scenarios. I'm monitoring that for the
Navy to see what they do; and then after they get all the
kinks worked out, we'll steal it. But that's kind of the
approach. I think that answers that it's not a simple answer
but that's what needs to be looked at. I think the shuttle
has the same issue: Where does it need to be ten years from
now?
MR. HUBBARD: I heard one of you mention or whisper
the term "vehicle health monitoring," I think. The notion
of a fleet of three. I'd just like you to think out loud
for a minute or two about how vehicle health monitoring
would apply in this case along three lines. One, what would
a systems approach be to that, given that we have a fleet
of three? Second, realtime versus recorded measurements?
Third, what other measurements could you imagine? I mean,
we've got a thermal protection system, for example, that
is pretty unique to the orbiter versus the military aircraft
you mentioned. We've got pressure, strain, and temperature.
Can you imagine, in this kind of systematic approach to
vehicle health monitoring, what one might do?
MR. ERNST: Let me answer in reverse order. I don't
want to bad-mouth technology. And I've talked about some
cultural issues but there's some real technology advancements.
I know some of the DOE labs have now started looking at
electronic signature analysis for failures in motors, predicting
when motors are going to fail. There are all kinds of things.
I mean, you can literally go around to the different areas
and find better ways that people can get precursors to failures
if they measure data and give you good information. That
would help us understand. From an overhaul interview, it
would let us know if you had a degraded flight mode issue
so that we're not having, yes, that system failed, we have
to do something else. It would really help you manage your
redundancy a lot better, too. So there are a lot of new
technologies beyond the strain engages that I learned about
in college that need to do.
I think the realtime versus recorded is something you need
to use a system engineering approach in analyzing. There
are oil analysis systems that I remember we had a vapor
cycle system and by the time you got oil in the filter,
you had basically eaten the whole system; it was too late.
So putting an oil analysis system that you measure it every
ten hours wasn't doing any good. It needed to be realtime.
Not everything needs to be realtime and any information
at all, whether it be on one unit or on three units, is
a lot more than no information and I think that having some
health monitoring systems on any fleet -- shuttle, the F-18,
the S-3, or whatever, F-15 -- gives you information if you
use a good systems engineering approach, not just collect
data for data's sake but see what are you trying to do with
the data and then drive what you need to collect to get
data or what technology best does that, I think, is helpful.
DR. GEBMAN: I would like to speak both as a proponent
and also share a word of caution. The engineering in me
would prompt me to want to put strain gauges and instrumentation
in many places. Probably too many. There's a trade-off between
the disease and the cure, and it's possible to overdo a
good thing. We need to remember that, with this instrumentation,
comes wires; and we've already been talking about the vulnerability
that wiring can introduce into the system. So what I would
think might be helpful is to try to understand what are
the critical issues that we're concerned about or we should
be concerned about and then ask, for those critical issues,
what initially at least modest amount of additional instrumentation
might be appropriate and try to really focus on the core
vulnerabilities and not to go too quickly too far overboard.
MR. ERNST: We can't be kids in the candy shop. I
agree.
ADM. GEHMAN: Thank you, sir. I'm going to ask the
last question myself; and, hopefully, it's a brief one.
I think probably, Mr. Gebman, your Chart No. 3 answers this
question; but I want to allow us to listen to it for a second.
Would you list the aircraft aging areas of examination as
to which of them appear to be mature technologies and which
of them appear to be not so mature? Obviously the detection
of corrosion, of course, is obviously a big one and I suspect
we'd probably know a lot about that.
DR. GEBMAN: Probably the quickest answer to the question
would be to focus on the first column and the last three
columns. In the last three columns, we have my subjective
assessment of where we stand in terms of data, methods,
and people. The metals area for structure, we're in very
good comparative shape to the others.
In corrosion, our data and our methods are still what embryonic
but now, thanks to the various laboratories really engaging
the last several years in a more aggressive way, we're building
a core of people that are knowledgeable in the area.
The business of adhesion, we haven't paid much attention
to it. And my sense is that our data and methods are below
low and even the number of people really knowledgeable in
that area is not great.
Moving down to the composites, there's a lot of people out
there. There's a fair number of people doing excellent,
promising research; but the fruits of that research in terms
of data and methods is still forthcoming.
In the area of propulsion, the general area strikes me,
especially when we're thinking about shuttle types of applications,
as not particularly high. The whole area of high-cycle fatigue
is still a challenge for the engine community, even for
commercial aircraft.
Then the "Other" category. This is the one that worries
me most because it's oftentimes the one that's not getting
the attention that's the one that bites you the hardest.
Functional systems, pumps, motors. The TWA 800 killed more
people than metal structures in recent times, and that may
well have been down in this "Other" category, either the
wiring or the functional systems.
So as the board moves forward with its good work, attention
to all of the technical areas. And all that I've tried to
accomplish here today is to bring forward that there are
some areas where the aging aircraft community really has
depth. If that proves to be relevant or of interest, the
community is certainly prepared to help. In the others,
it's going to be more challenging.
ADM. GEHMAN: Well, thank you very much. On behalf
of the board, I would like to express your appreciation
for your attendance here today and your complete and helpful
replies to our questions and the information that you've
given. You're obviously great experts and we've learned
a lot and we hope that we can apply it to this problem.
We appreciate your attendance.
We're going to take about a ten-minute break while we seat
the next panel, and we'll be right back.
(Recess taken)
ADM. GEHMAN: All right. We're ready to begin our
last session for the day.
It's a privilege for the board to recognize Dr. Diane Vaughan
from Boston College. Dr. Vaughan has written an influential
and well-read book on the Challenger accident. We are continuing
our look into the business of risk assessment and risk management.
This is one of the classic studies on the Challenger accident.
Most of the board members have at least read parts of your
book, Professor Vaughan; and we're delighted to have you
here.
DR. VAUGHAN: Thank you.
GEN. BARRY: And we're ready for a test.
ADM. GEHMAN: I would like you to please, if you would,
before we get started, introduce yourself by telling us
a little bit about your background; and then if you would
like to say something to get us started, we would be delighted
to hear you.
DIANE VAUGHAN testified as follows:
DR. VAUGHAN: Thank you. I'm a sociologist. I received
all of my education at Ohio State University, getting my
Ph.D. in 1979. After that, I had a post-doctoral fellowship
at Yale; and I began teaching at Boston College in 1984,
where I am currently a full professor.
My research interest is organizations. I'm, in particular,
interested in how organizational systems affect the actions
and understandings of the people who work in them. So it's
what we call, in my trade, making the macro-micro connection,
how do you understand the importance and effect of being
in an organization as it guides the actions of individuals.
My research methods are typically what we could call qualitative,
which are interviews, archival documents, and ethnographic
observations. So using these methods, I have written three
books, the last of which was The Challenger Launch Decision,
which was published in 1996.
ADM. GEHMAN: Thank you very much. You may proceed.
DR. VAUGHAN: All right. I want to start from the
point of view of Sally Ride's now famous statement. She
hears echoes of Challenger in Columbia. The question is:
What do these echoes mean? When you have problems that persist
over time, in spite of the change in personnel, it means
that something systematic is going on in the organizations
where these people work.
This is an O-ring -- not The O-ring, but it is an O-ring.
I want to make the point that, in fact, Challenger was not
just an O-ring failure but it was the failure of the organizational
system. What the echoes mean is that the problems that existed
at the time of Challenger have not been fixed, despite all
the resources and all the insights the presidential commission
found, that these problems have still remained.
So one of the things that we need to think about is when
an organizational system creates problems, the strategies
to make the changes have to, in fact, address the causes
in the system. If you don't do that, then the problems repeat;
and I believe that's what happened with Columbia.
What I would like to do is begin by looking at what were
the causes of Challenger and, based on my research, to point
out how the organizational system affected the decisions
that were made, and then make some comparisons with Columbia
and then think about what it might mean, taking that information,
to make changes in an organization to reduce the probability
that this happens.
One of the things that we have learned in organizational
--
ADM. GEHMAN: Excuse me for interrupting. If I may
ask a question while we're still on this subject. On your
first viewgraph, the first bullet, you said when you find
patterns that repeat over time despite changes in personnel,
something systemic is going on in the organization. There
are no negative connotations in that sentence. You didn't
say something wrong is going on in the organization. I assume
the obverse is also true. If patterns repeat over time and
you keep changing people and you keep getting good results
--
DR. VAUGHAN: The system is working. Right. It's the
fact that there is a bad outcome that we're looking at here.
Thank you.
ADM. GEHMAN: Thank you. Sorry for the interruption.
DR. VAUGHAN: I wanted to begin and go back over just
really briefly what happened in Challenger. First, the presidential
commission reported that there was a controversial eve-of-the-launch
teleconference during which worried engineers at Morton
Thiokol, the solid rocket booster contractor in Utah, had
then objected to the launch, given that there was going
to be an unprecedented cold temperature at launch time the
next day.
Marshall management, however, went ahead and launched, overriding
the protests of these engineers. Not only did the commission
discover that but also the fact that they discovered that
NASA had been flying with known flaws on the solid rocket
boosters O-rings since early in the shuttle program, that
these flaws were known, and known to everybody within the
NASA system.
May I have the next slide, please. What happened was what
I called an incremental descent into poor judgment. This
was a design in which there were predicted to be no problems
with the O-rings, no damage. An anomaly occurred early in
flights of the shuttle and they accepted that anomaly and
then they continued to have anomalies and accepted more
and more. This was not just blind acceptance, but they analyzed
them thoroughly and on the basis of their engineering analysis
and their tests, they concluded that it was not a threat
to flight safety. It's important to understand, then, that
this history was a background in which they made decisions
on the eve of the teleconference; and that was one more
step in which they again gradually had expanded the bounds
of acceptable risk.
Next slide, please. One of the things that's critical with
Challenger, and also now, is the fact that we tend to look
at bad outcomes and work backwards and we're able to then
put in line all of the bad decisions and apparently foolish
moves that led up to it. It becomes very important to look
at the problems as they were unfolding and how people saw
them at the time and try to reconstruct their definition
of the situation based on the information they had when
they made their choices.
Next slide, please. The Challenger launch decision was,
in fact, a failure of the organizational system; and I hope,
by going through the explanation, it will show why it was
not groupthink, it was not incompetent engineers, unethical
or incompetent managers.
Next slide, please. So what happened? Richard Feynman called
it Russian roulette, which implies that there is a knowing
risk-taking going on. The result of my research, I called
it something else, the normalization of deviance; and I
want to use the organizational system perspective to explain
how this happened.
The idea of an organizational system is that there are different
levels at which you have to do your investigation. So the
first is the people doing the work, their interactions,
and what they see; the second level is the organization
itself; and the third level has to do with the environment
outside the organization and the other players that affect
what's going on internally.
So let's start with the bottom layer, the people doing the
interaction. First, it's important to know that they were
making decisions against a backdrop where problems were
expected. Because the shuttle was designed to be reusable,
they knew it was going to come back from outer space with
damage; and so there was damage on every mission. So simply
an environment like that, to have a problem is itself normal.
So what to us in hindsight seemed to be clear signals of
danger that should have been heeded -- that is, the number
of flaws and O-ring erosion that had happened prior to Challenger
-- looked different to them. The next slide will show how
they looked as the problem unfolded.
What we saw as signals of danger, they saw as mixed signals.
They would have a problem flight. It would be followed with
a flight for which there was no problem. They would have
weak signals. Something that in retrospect seemed to us
to be a flight-stopper, to them was interpreted differently
at the time. For example, cold, which was a problem with
the Challenger flight, was not a clear problem and not a
clear caught on an earlier launch. Finally, what we saw
as signals of danger came to be routine. In the year before
Challenger, they were having O-ring erosion on 7 out of
9 flights. At this time it became a routine signal, not
a warning sign.
The next slide, please. That's what's going on on the ground
floor. So the question is then how does the organizational
system in which they're working affect what they're doing
and how they're interpreting this information and how their
decisions move forward. This is what I call the trickle-down
effect. Congress and the White House were major players
in making decisions, and their policy decisions affected
how people were making decisions in the project.
The budget, the problem of Challenger starting out with
insufficient resources, meant that the only way the program
got going was by Challenger, by the shuttle program being
responsible in part for its own livelihood. That is, it
would carry payloads. The number of payloads it would get
paid for annually were expected to contribute to its budget.
So early on, the space shuttle project was converted from
what during the Apollo era had been an R&D organization
into a business. Contracting out and regulation both had
altered the shuttle program so that it was much more bureaucratic.
There were a lot of people who had been in pure engineering
positions were reversed in the sense that they became more
administrative. They were put in oversight positions, and
they had a lot of desk work to do.
Finally, when the program was announced, it was announced
that it would be routine to fly shuttles into space. It
would operate like a bus. So the expectation that it would
be routine also had an effect in the workplace. The effect
was to transform really a culture that had been pure R&D,
with emphasis only on the technological discovery, into
one that had to operate more like a business in that cost
was a problem, production pressures were a problem.
The notion of bureaucratic accountability made the agency
what some people told me was bureau-pathological. That is,
there were so many rules, there were so many forms to be
filled out that these kinds of tasks deflected attention
from the main job of cutting-edge engineering. It wasn't
that the original technical culture died but that, in fact,
it was harder to follow it through with these other influences
on the shuttle program.
How did these actually play out on the ground? Next slide.
The original technical culture called for rigorous scientific
and quantitative engineering, real solid data in the form
of numbers to back up all engineering arguments; and that
was still true. However, also with the original technical
culture, there was a lot of deference to engineering and
engineering expertise based on the opinions, valued opinions,
of the people who were doing the hands-on work.
The latter was harder to achieve within a bureaucratic organization
where hierarchy dominated. The schedule became a problem
interfering with the decisions by compelling turn-arounds
in time to meet the schedule, so that expected research
on hardware problems sometimes continued past the next launch.
So they were still getting more information while a new
launch was in process.
It also affected them in that the engineers and managers
truly followed all the rules. In the midst of a system that
many people at the time said was about to come down under
its own weight before Challenger, what was happening was
the fact that they followed all the rules in terms of having
the numbers, in terms of procedures, gave them a kind of
belief that it was safe to fly. Engineering concerns had
to be backed up with hard data or there couldn't be money
set aside to do a correction to the program. Hunch and intuition
and concern were not enough.
Next slide, please. The third part is to say, well, there
was a long incubation period here. Why didn't someone notice
the trend that was going on with the solid rocket booster
project in terms of O-ring flaws and intervene? This is
where the organization's structure was at that time a problem.
The safety system had been weakened. One safety unit had
been completely dissolved, and staffing had been cut back.
Top administrators, because of extra work in an expanding
programming, were no longer able to maintain what in the
Apollo program was known as the dirty-hands approach --
that is, keeping in touch with the technology, the problems,
and the riskiness of it.
And the anomaly tracking system, which was another way that
you could get warning signs, made it very difficult for
administrators to isolate serious problems. At one time
under their Criticality 1 category, which is the most serious
label that you give to a technical problem, they had 978
items on it. So how, of those, do you sort out which are
the most serious?
Next slide, please. With this as an outline, I'd like to
move to some comparisons, the echoes that Sally Ride talked
about. First, here I'm drawing analogies. I spent nine years
on the Challenger book and I haven't done this on this case
and your investigation is still underway. So where I'm able
easily to identify the similarities, it's harder to define
the differences; and what we see now as similarities are
yet to be proved. So my goal here is just to maybe point
you in some ways to look, and not come to any conclusions.
First, in both circumstances, Columbia and Challenger, a
crisis -- well, let's say it was a crisis of uncertainty.
Circumstances happened for which they had no background
experience. They came to this condition of high uncertainty
with a belief in acceptable risk -- that is, based on all
the flight readiness reviewed decisions that had proceeded,
they believed they were flying with a vehicle that did not
have a problem that was related to, in Challenger, O-rings
and, in Columbia, the foam problems. They believed in their
own analysis. That was this background, and they had engineering
reasons for believing that.
Second, in each of those cases, Challenger and Columbia,
there had been an event in the recent past that had some
import for their decision-making that night. For Challenger,
the year before the launch, STS 51B was launched in January.
The condition that the engineers on the eve of the Challenger
launch were concerned about was the cold temperature, which
for the next day was predicted to be at an all-time launch-time
low. The STS 51B, which was launched in January of 1985,
was launched where also cold temperature mattered but not
on the launch pad. The cold temperature had been the three
previous nights when the vehicle was sitting on the launch
pad and the temperature was down 19 to 22 degrees at that
time.
The foam strike in Atlantis. There had been several foam
strikes preceding the Columbia launch. The Atlantis foam
strike, which happened in October of 2002, was the most
recent. The history in the foam strikes was that they had
problems with imagery, that they couldn't see so much the
location of the strikes and so on. So that was part of the
history which led to the fact that that night they didn't
have or that -- when they discovered the foam strike, that
they didn't have good data.
For the cold temperature on 51B, there was a similar effect.
At the time when they did the analysis, the engineer who
went to the Cape and looked at the vehicle when it was disassembled
and looked at the solid rocket boosters was alarmed because
he saw that in the base of the putty in the groove in which
the O-rings lay, the grease was charred black like charcoal;
and he believed that this was significant. But when they
came forth after that with their analysis of 51B for the
next Flight Readiness Review, their analysis showed them
that it was still safe to fly. They had had damage of the
O-ring, they had serious O-ring erosion, and they had had
for the first time hot gases that had gone beyond the primary
O-ring to its backup, the secondary O-ring, and their analysis
told them that in a worst-case scenario, it would still
work. It would still work.
Where does cold come into this? The engineer who saw the
charcoaled grease had this feeling that, intuitively, this
was bad. So when he argued that cold should be a serious
concern, they had at that point had many things happening
with O-rings. The smallest thing could cause damage. So,
for example, a piece of lint in the bed of putty in which
an O-ring lay could cause erosion. Each time something different
had happened. They believed that there was no generic problem
because they were not having damage on every ring on every
mission. Sometimes they would not have any, so that he could
not prove that cold was a correlation with the O-ring damage.
They decided at that point that they should get some cold
temperature data; but they didn't scramble to get it, as
this engineer said. The reason they didn't was they believed
it was a unique incident, that the chance of overnight temperatures
of that low for three nights running in Florida was, in
his words, the equivalent of having a 100-year storm two
years in a row. So there was no scramble to get temperature
data. They did some resiliency tests, but they did not have
systematic temperature data. So in both circumstances, when
the condition of high uncertainty came up for both Columbia
and Challenger, they did not have a lot of supporting data,
they didn't have the best data available to them and this,
it turned out, mattered.
The third point is that the organization's structure interfered
with good communication, and it interfered in several ways
in which there seem to be parallels across cases. There
were, in this case, missing signals. People who had information,
if that information had been relayed up the hierarchy, might
have made a difference. People in the Challenger evening
teleconference were in three different locations, and they
were in telephone communication but not video. People were
in different locations who did not speak up, so their message
didn't get across on the main teleconference line.
Why didn't they speak up? Some people felt that that was
their specialization, they hadn't worked on it recently,
and therefore though they had some input and they had some
information, they didn't know what the most recent data
was. Some people didn't speak up because it simply wasn't
their specialization. Other people didn't speak up because
they trusted in the hierarchy, they trusted in the key people
who were running the teleconference to guide it in the right
direction, they trusted the engineers at Thiokol to do the
analysis. Those were some of the reasons.
One of the parallels with Columbia comes up in the accounts
of the E-mails that were circulated from approximately the
21st on, worries of concerned engineers. From newspaper
accounts that I've been able to conclude and the E-mails
themselves, that in a sense they were marginal to the process,
they had not been brought in early on, this was a conversation
they were having among themselves. They were also specialized
and felt that perhaps they didn't have the same information
that other people had. There was a trust in the hierarchy;
and, as one of them said after a press conference early
in your investigation, "We didn't have the data." That is,
they were concerned they didn't have any hard numbers.
One of the characteristics of the conversion from the Apollo-era
culture to the Challenger-era culture was that intuition
and hunch didn't carry any weight. They carried weight in
everyday, daily decision-making and batting around ideas,
but when it came to formal decisions like the Flight Readiness
Review, it was hard data, it was numbers that were required.
And in this case it was significant to me that he said we
didn't have the data and therefore, not having the data,
they didn't feel empowered to speak up in these E-mails
and carry them upward farther.
There is evidence of production pressure in the Challenger
case that I haven't seen yet in Columbia. In Challenger,
there was a deadline for the engineers to make their preparation
for their eve-of-the-launch teleconference engineering recommendation
about the relationship between the cold temperature and
O-ring erosion and what they expected, what they were recommending
in terms of launch. They scrambled to put their analysis
together, dividing up the work, and began faxing their charts
over the telecon line without having the time to look through
them, and if they had taken that time, they might have noticed
ahead of time -- if they had collectively looked through
them, they might have noticed ahead of time that they didn't
have a strong correlational argument. So as a consequence,
it was a weak argument in terms of the engineering culture
at NASA. The hard numbers didn't hold together. They couldn't
prove that there was a cold temperature correlation with
O-ring damage.
At one point the key engineer said, "You know, I can't prove
it. I just know it's away from goodness in our data base."
But in that culture, that was considered an emotional argument,
a subjective argument, it was not considered a strong quantitative
data argument in keeping with the technical tradition at
that time.
So far there isn't any evidence of engineering concerns
during the history of the foam problem like there was with
Challenger either. Afterwards, there had surfaced some memos
in Challenger, the previous year in particular, as engineers
at Thiokol was trying to get through the bureaucratic rigmarole
in order to get the help they needed to try to analyze the
problem; and they were working on a fix at the time.
The other point I wanted to make was about bureaucratic
accountability. What was obvious with Challenger was that
on the eve of the launch that the concerns of the engineers
were not prioritized. It also seems to be the case in the
requests for the imagery from Columbia that concerned engineers
discovering the foam strike at this point described it as
it was large. There was nothing in their experience like
this. It was the size of a Coke cooler. This was unique.
They met, a team of approximately 37 engineers, and made
a request for better visuals than the ones that they had
from ground camera; but somebody up the hierarchy canceled
the request. In a condition of high uncertainty. One of
the comments that I read in the newspaper -- and I don't
claim to have all information on this -- was that the request
had not gone through proper channels, which points to me
the significance of rules and hierarchy over deference to
technical expertise in this particular case.
There are many conclusions we can think about from this,
but one of them is that in both of these situations, following
the normal rules and procedures seemed to take precedence;
and we know that, in fact, in conditions of uncertainty,
people do follow habits and routines. However, under these
circumstances where you have something without precedent,
it would seem that this would be a time not for hierarchical
decision-making but for a more collective, collaborative,
what does everybody think, let's open the floodgates and
not pull on the usual people but especially what are the
concerns of our engineers and also to let up on the idea
that you have to have hard data. Engineering hunches and
intuitions are not what you want to launch a mission with;
but when you have a problem that occurs that's a crisis
and you don't have adequate information, this is a reverse
of the pro-launch situation, in which engineering hunches
and intuitions ought to be enough to cause concerns, without
asking for hard data.
So what's to be done if it turns out in this investigation
that you do, in fact, find a failure of the organizational
system? Could I have the next slide, please.
Typically in the results of an accident investigation two
things happen. One is that the technical culprit is found
and a technical fix is recommended and achieved; second,
that key decision-makers are identified who had important
roles and where they might have prevented a bad outcome
but didn't. More typically, the organizational system goes
untouched. It is, in fact, more difficult to identify the
flaws in the organizational system. It's harder to pin it
down and it's more challenging to try to correct it, but,
in fact, there are many people who are experts in how to
build high-reliability systems and what are the problems
with systems from an organizational system that might help
in advice in circumstances like this.
Next slide, please. Just looking at the model that I put
up earlier where we looked at the trickle-down effect, it
leaves three levels at which you might target changes. First,
the beauty of operator error is that it deflects attention
from key policy decisions made in the past that have affected
a program and affected the daily operations. Policy leaders
need to be concerned and aware of their responsibility with
risky systems and be aware of how their choices affect the
hands-on work. They also are responsible and implicated.
Cultures, for example, are hard to change; but leaders might
try to change them, even if they weren't the ones who created
them. It's important that they remain in touch with the
hazards of the workplace. Whereas in the modern NASA it
may be more difficult for administrators to stay in touch
with the hazards of the workplace and the dirty-hands approach
cannot be carried out like it was in the time of Apollo,
still it's important to stay in touch with those.
For example, prior to Challenger, the shuttle was declared
as an operational system. As a result of that and the belief
and the expectation it would be routine, citizens were allowed
to be taken on for ride. The people at the top of the organization
apparently believed that it was not a risky technology and
therefore it was safe to take along ordinary citizens. The
engineers who were doing the daily work did not believe
that it was -- I mean, they were aware of all the problems
in the system on a day-to-day basis. They were the ones
who had the dirty hands. They were not the ones who made
the decision to put a teacher on the space shuttle.
Another aspect of concern for top leaders is changes are
often made in an organization's structure for budgetary
reasons, for better coordination, without thinking about
how that might affect the people who are having to make
decisions at the bottom. What does it mean, for example,
when you have an International Space Station and NASA is
now dividing up the work so that there are two combined
structures and projects in which decisions have to be made?
How are these priorities getting sorted out? Does that affect
what's going on in the program?
Contracting-out had a serious effect on the work of people
making technical risk analyses. We know hospitals, when
they have mergers, often let people go and it loses the
institutional memory and there are startup costs in people
getting going again. These kinds of changes should not be
made without looking at their implications.
Second. Please, next slide. Target culture. You can't really
make assumptions about your culture. We think we understand
our cultures, but they act invisibly on us, and so we cannot
really identify what their effects are. In one of the comments
post Columbia concerning the E-mails, "We have a safety
culture and we strongly encourage everyone to speak up at
every opportunity." And I'm sure that they believe that.
But when you look at the chronology of events, even in skeletal
form in which I'm aware of them, the fact that these what-ifs
didn't percolate up the hierarchy, the fact that the engineering
requests did not get fulfilled indicates that there are
some things that suppress or that are acting to suppress
information.
It's also significant, I think, in terms of culture to understand
the power of rules. The things that we put in organizations
that do good also can have a dark side. It is really important
at NASA, because of the complexity of the agency and its
projects, to have rules. You couldn't run it without rules.
It's impossible. But then there are times when maybe the
normal rules don't apply. So how do you train people to
recognize circumstances when you have to expedite matters
without going through the hierarchy, and how do you empower
engineers to get their requests filled?
Finally, targeting signals. Missing signals are obvious
in both cases. What does it mean to try to reduce missing
signals? One is to truly create a system in which engineers
have more visibility, their concerns have more visibility
on a formal and informal basis. Second, the safety system.
The parallel with Challenger and the reduction of safety
personnel is also a parallel with Columbia. When you reduce
a safety system, you reduce the possibility that other people
are going to be able to identify something that insiders
have seen and normalized the technical deviation. And the
slippery slope. When you're working in a situation where
problems are expected, you have problems every day, and
people are busy with daily engineering decisions, it becomes
very difficult to identify and stay in touch with the big
picture.
How do you identify the trend so that people are aware when
they are gradually increasing the bounds of acceptable risk?
It is certainly true, based on what we know about organizations
and accidents in the social sciences that this is a risky
system and what we know is the greater the complexity of
the organization, the greater the possibility of a failure.
The same is true of organizations. Organizations are also
complex systems. The greater the complexity of the organizational
system, also the greater the possibility of a failure. When
you have a complex organization working a complex technology,
you're never going to be able to completely prevent accidents,
but the idea is to be more fully aware of the connection
between the two so that you can reduce the probability that
a failure would occur.
That's it. Your turn.
ADM. GEHMAN: All right. Well, that's a bucket full.
Since you studied the Challenger decision so carefully,
and even though we're talking about Columbia here, let me
ask a Challenger question, even though it's loaded because
it has Columbia implications. Several things you said struck
me, and they're related to each other. One is that you can't
change the behavior unless you change the organization.
You can change the people, but you're going to get the same
outcome if the organization doesn't change. Yet in another
place up there, you said beware of changing organizations,
because of the law of unintended consequences. You've got
to be real careful when you change organizations.
What do you make of the post-Challenger organizational changes
that took place, particularly in the area of more centralization
and program management oversight? What do you make of all
of that?
DR. VAUGHAN: The changes that I am most familiar
with are the ones related to launch decisions. That is that
immediately following, they put an astronaut, former astronaut
in charge of the final "go" outcome of the Flight Readiness
Review procedure and they tried to integrate engineers,
working engineers, into the flight readiness process more.
I'd say that there is always a problem in organizations
in providing the stability and the centralization needed
to make decisions and make sure information gets to the
top and providing the flexibility to respond to immediate
demands; and without, you know, really studying this, I
would say that what we know about Columbia is that flexibility,
at least in a couple of circumstances, really wasn't there.
That becomes interesting in thinking about the differences
in the pre-launch decision-making structure and post-launch
decision-making structure. That is, the post-launch decision-making
structure is actually designed to create that kind of flexibility
so that you could pull in people as you need it and so on.
What's ironic about it is it looks as if had there been
either a direct route for engineering concerns to get implemented
to shortcut what really little bureaucracy there seemed
to be in that process that that would have helped, that
if, you know, that could have circumvented the kind of need
for hierarchical requests for imaging. In terms of the overall
impact on NASA, I really can't say that.
ADM. GEHMAN: From my understanding, though, one of
the post-Challenger results has been a much more formal
FRR process. As you are probably aware, no more telephone
calls, it's all face-to-face, it's done at the Cape, and
you've got to be there and they're done in big rooms like
this with hundreds of people in the room with several different
layers, everybody there, and there's a whole lot of signing
that goes on. People at several layers actually sign pieces
of paper that say, of the thousands of things that I'm responsible
for, they've all been done with the exception of A, B, C,
and D, and then they have to be waived or something like
that. Then they go through a many, many hour process of
making sure that everything's been taken care of and every
waiver has been carefully analyzed and in front of lots
of high-level people. So it's very meticulous, it's very
formal, and it's an eyeball-to-eyeball commitment that my
organization has done everything my organization is supposed
to have done.
Is that the kind of an organization in which weak and mixed
signals can emerge? I mean, is that the kind of organization
which would recognize mixed and weak signals and routine
signals? Is that compatible kind of with your -- I'm still
talking Challenger -- with some of the principles you outlined
here?
DR. VAUGHAN: This was fairly much the procedure that
existed at the time of Challenger, where every layer of
Flight Readiness Review had to sign off on it. The criticism
at the time, post Challenger, was that what was happening
was the engineers who were making the analyses and coming
forward at the Level 4, the ground level of Flight Readiness
Review, those were the people who were getting the mixed,
weak, and routine signals; but when they came together,
they had to come up with a consensus position for their
project manager to carry forward. And once they agreed,
then they began gathering the supportive data that this
was an acceptable flight risk. And as their recommendation
worked itself up through the hierarchy, the system was designed
to criticize it, to bring in people with other specializations
who could pick it apart, and the result of that was to make
them go back to the desk and sometimes to do more engineering
analysis. That engineering analysis tended always to support
the initial recommendation. So by the time it came out the
top of the process, it was something that might have been
more amorphous on a day-to-day basis was dogma and very
convincing, which is why, with a backdrop of having that
kind of information, you have people who believe in acceptable
risk, it's based on solid engineering and history, who need
to be convinced by hard data that something different is
happening this time.
The system is designed to review decisions that have been
made, that if there is a mistake in the fundamental engineering
analysis, they can criticize it, but they can't uncover
it at the other layers, which would mean that you would
need another kind of system to detect that, such as outsiders
who bring fresh eyes to a project on a regular basis. The
Aerospace Safety and Advisory Panel was very effective during
the years of Challenger, with the exception of the fact
that their charter kept them coming for visits perhaps 30
times a year. So it was impossible for them to track all
the problems; and at that point when Challenger happened,
they were not aware of the O-ring erosion and the pattern
that was going on.
ADM. GEHMAN: I'm still trying to understand the principles
here. It seems to me that in a very, very large, complex
organization like NASA is, with a very, very risky mission,
some decisions have to be taken at middle-management levels.
I mean, not every decision and not every problem can be
raised up to the top and there must be a process by which
the Level 2, Level 3, and Level 4, that the decisions are
taken, minority views are listened to, competent engineers
weigh these things, and then they take a deep breath and
say, okay, we've heard you, now we're going to move on.
Then they report up that they've done their due diligence,
you might say.
I'm struggling to find a model, an organizational model
in my head, when you've got literally thousands and thousands
of these decisions to make, that you can keep bumping them
up higher in the organization with the expectation that
people up higher in the organization are better positioned
to make engineering decisions than the engineers. I mean,
you said yourself, "Hindsight is perfect." We've got to
be really careful about hindsight, and I'm trying to figure
out what principles to apply.
We as a board are certainly skittish about making organizational
changes to a very complex organization for fear of invoking
the law of unintended consequences. So I need to understand
the principles and I'm trying to figure out a way that I
can apply your very useful analysis here and apply it to
find a way to figure out what the principles are we ought
to apply to this case. So the part that I'm hung up on right
now is how else can you resolve literally thousands of engineering
issues except in a hierarchical manner in which some manager,
he has 125 of these and he's sorted through them and he
reports to his boss that his 125 are under control. I don't
know how to do that.
DR. VAUGHAN: Well, two things. First, somehow or
other in the shuttle program, there is a process by which,
when a design doesn't predict an anomaly, it can be accepted.
That seems to me to be a critical point, that if this is
not supposed to be happening, why are we getting hundreds
of debris hits, if it wasn't supposed to happen at all.
It's certainly true that in a program where technical problems
are normal, you have to set priorities; but if there is
no design flaw predicted, then having a problem should itself
be a warning sign, not something that is taken for granted.
The idea is to spot little mistakes so that they don't turn
into big catastrophes, which means spotting them early on.
Two things. And one I'm certain that NASA -- maybe both
of them -- that NASA may be very aware of is the fact that
engineers' concerns need to be dealt with. I can understand
the requirement for hard data. But what about the more intuitive
kinds of arguments? If people feel disempowered because
they've got a hunch or an intuition and let somebody else
handle it because they feel like they're going to be chastised
for arguing on the basis of what at NASA is considered subjective
information, then they're not going to speak up. So there
need to be channels that assure that, even giving engineers
special powers if that's what's necessary.
The other is the idea of giving more clout to the safety
people to surface problems. So, for example, what if the
safety people, instead of just having oversight, were producing
data on their own, tracking problems to the project for
which they're assigned and, in fact, doing a trend analysis
to keep people's eye on the big picture so that the slippery
slope is avoided?
ADM. GEHMAN: Thank you for that.
DR. VAUGHAN: Let me add also that there are other
models of organizations that deal with risky systems, and
social scientists have been studying these. They have been,
you know, analyzing aircraft carrier flight decks and nuclear
operations and coal-mining disasters. There are all kinds
of case studies out there and people who are working in
policy to try to see what works and what doesn't work. Are
there lessons from air traffic control that can be applied
to the space shuttle program? What carries over? Is there
any evidence that NASA has been looking at other models
to see what might work with their own system?
I know that in air traffic control they use an organizational
learning model. What we find out from this comparison between
Columbia and Challenger is that NASA as an organization
did not learn from its previous mistakes and it did not
properly address all of the factors that the presidential
commission identified. So they need to reach out and get
more information and look at other models, as well.
Thinking about how you might restructure the post-launch
decision-making process so that what appears to have happened
in Columbia didn't happen again, how can that be made efficient,
may be something -- maybe it needs to look more like the
pre-launch decision process. But is there any evidence that
NASA has really played with alternative models? And my point
about organization structure is as organizations grow and
change, you have to change the structures, but don't do
it without thinking about what the consequences might be
on the ground.
DR. LOGSDON: Just a short follow-up to that. Diane,
your book came out in 1996, I think, right, and was fairly
widely reviewed. We at the board discovered in some of our
briefings from outside folks that the submarine safety program
uses your work as part of the training program for people
that worry about keeping submarines safe. Have you had any
interactions with NASA since the book came out?
DR. VAUGHAN: No. 8
DR. LOGSDON: Have you ever been invited to talk to
a NASA training program or engage in any of the things that
you just discussed might be brought to bear?
DR. VAUGHAN: No, though, in fact, as you said, the
book did get quite a lot of publicity. I heard from many
organizations that were concerned with reducing risk and
reducing error and mistake. The U.S. Forest Service called,
and I spoke to hotshots and smoke-jumpers. I went to a conference
the physicians held, looking at errors in hospitals. I was
called by people working in nuclear regulatory operations.
Regular businesses, where it wasn't risky in the sense that
human lives were at cost. Everybody called. My high school
boyfriend called. But NASA never called.
(Laughter)
ADM. GEHMAN: Anybody want to comment on that?
GEN. BARRY: What was his name?
ADM. GEHMAN: Let me finish my thought here. Professor
Vaughan, again we're back to this organizational issue which
I'm trying to determine the principles that I can apply
from your analytical work here. If the processes we're talking
about in the case of NASA, if they didn't follow their own
rules, would that alarm you? What I mean is if there were
waivers or in-flight anomalies or systems that didn't work
the way they were supposed to work and, in the fact that
they didn't work the way they were supposed to work, somehow
started migrating its way down lower in the message category
to where it wasn't sending messages anymore and therefore
it was technically violating their own rules because they're
supposed to deal with these things, would that be a significant
alarm for you?
DR. VAUGHAN: Well, I think that one of the things
to think about here is that NASA is a system that operates
by rules; and maybe one of the ways to fix the problem is
to create rules to solve the problems. So what are the rules
when engineers need images, for example? Can't they find
a way where they have their own authority, without seeking
other authority, to get the necessary images? So I think
I read that someplace, where the harmony between the way
the organization operates and thinks in the key aspects
of the culture itself are something that you might want
to build on.
DR. WIDNALL: Actually I'm starting to frame in my
own mind that the problem is that there is, in fact, one
underlying rule and it's a powerful rule and it's not stated
and it's not stated as simply as this question of following
your own procedural rules. But let me sort of get into that.
I've certainly found your framework very helpful because
I've mused over this issue of how an organization that states
that safety is its No. 1 mission can apparently transition
from a situation where it's necessary to prove that it's
safe to fly, to one in which apparently you have to prove
that it's not safe to fly. I think what's happening is,
in fact, that engineers are following the rules but this
underlying rule is that you have to have the numbers.
DR. VAUGHAN: Right.
DR. WIDNALL: That's not the rule you stated, which
was that you should follow the procedures and resolve all
anomalies.
DR. VAUGHAN: This is a norm.
DR. WIDNALL: Those are these kind of rules. I'm talking
about the really basic rule that says you have to have the
numbers. So that basically means that every flight becomes
data and that concern about an anomaly is not data. So a
flight with an anomaly becomes data that says it's safe
to fly. So the accumulation of that data, of those successful
flights, puts the thumb on the scale that says it's safe
to fly; and people who have concerns about situations in
one of these uncertain situations that you talk about, they
don't have the data.
So I think it may be getting at, in some sense, changing
the rule to one that it is not okay to continue to operate
with anomalies, that the underlying rule of just having
data is not sufficient to run an organization that deals
with risky technologies. Because otherwise you're just going
to end up with a pile of data that says it's okay to fly,
and you're not likely to get much data on the other side.
ADM. GEHMAN: Is that a question?
DR. WIDNALL: That's kind of a comment.
DR. VAUGHAN: I completely agree with you. One of
the reasons I emphasized in an earlier slide that you need
to understand your culture is that it works in ways that
we don't really realize. So how many people there understand
the effect of intuition and hunch, which are absolutely
integral to good engineering, and how the kind of impression
on numbers suppress that kind of information in critical
situations?
People are disempowered from speaking up, by the very norms
of the organization. Things like language, though. For example,
the term I've read in the paper, "That's in family." That's
a real friendly way of talking about something that's not
really supposed to be happening in the first place. In nuclear
submarines, they don't talk about it as "in family"; they
talk about it as a degradation of specification requirements,
which has a negative feeling to it. These kinds of languages
which we think of as habits of mind reflect attitudes that
are invisible, but the language really shows.
So the question is, you know, how can you get back in touch
with the importance of engineering intuition and hunch in
formal decision-making. Usually it works in the informal
decision. You know, I think that's why the NASA administrators
believe that they've got a safety culture and that people
are free to express whatever they think; but when it comes
to a formal decision, they fall back into the formal rules
and that expression of concern doesn't get expressed.
Even if you take something as simple as an engineering presentation,
the fact that it's reduced to charts, which are systematic,
gets all the emotion out of it. It begins to look even more
routine. The engineer in Challenger who saw the burned grease,
the black grease, was seriously alarmed. I asked him, you
know, later, "Did they see this? What did they see? Did
they get a photograph?" He said yes. I said, "How did it
look in the photograph?" He said it did not look serious
in the photograph. So emotion is keyed to some kind of a
logic based in engineering experience, and it should be
valued and a way found to express it.
GEN. BARRY: Diane, I'm going to ask you a short question,
and then I'm going to ask a longer question, if I may. First,
the short question, focusing on organizational failure.
The Rogers Commission, did they fall short on institutional
recommendations in the aftermath of Challenger, or were
they good ones and they just weren't followed through by
NASA?
DR. VAUGHAN: The Rogers Commission was very good
at identifying what they called contributing causes and
that I would call system causes. That is, they identified
safety cuts, cuts in safety personnel. They identified the
failure of NASA to respond to recommendations of the Aerospace
Safety and Advisory Panel. They identified the history of
the program and the fact that it was a design that was built
on budget compromises in the beginning. They identified
production pressures. They identified all those kinds of
outside sources that had impacted the decision-making and
that were a part of NASA's history.
In the recommendations, they didn't come forward with anything
that said give them more money, change the culture. They
weren't sociologists. They weren't social scientists and
not trained to think about how that might have actually
worked. The way it looked like it worked was in the sense
that there were pressures there and key managers, namely
Lawrence Malloy, who was the project manager for the solid
rocket booster project at that time, was the operator who
made the error. Once that happened and the key person was
identified and people changed and new people came in, then
the system problems remained.
They fixed the technology. They fixed the decision-making
structure in ways I described earlier. But the organization
didn't respond and neither did -- in keeping with my point
earlier about top leaders being responsible -- the organization
did not respond in terms of getting more money beyond what
it took at that point to fix the technical problem. They
got an initial boost, but they've been under budgetary constraints
all along. The recommendations in the volume of the presidential
commission were related strictly to internal NASA operations.
They were not directed towards policy-making decisions that
might have affected the program.
GEN. BARRY: Okay. Let me build on that a little bit
and just carry it on and see if this resonates with you.
I'm going to list off a bunch of items here and see if this
falls true with what you know to be from Challenger that
might be able to be translated over to Columbia.
First of all, you stated that with Level 4 identifying problems
and being able to try to communicate that up the institution,
the organization kind of stymied that. So I would characterize
that as needing to prove that there is a problem in the
early stages of the FRR or before flight. I think post Challenger,
you know, there has been a fix on that and, remember, the
Flight Readiness Review is supposed to prove not only launch
but also en route and on recovery. So it's the whole flight.
It seems like they've solved the problem on trying to say
is there a problem in proving it. To post launch. There's,
some would argue, an attitude that you have to prove there
is a problem. So we kind of fix it on the launch side; but
after it's launched, we kind of relegate back to maybe the
way it was prior to Challenger: Prove to me there is problem.
Now, if we try to look pre and post launch, pre-launch is
very formal, as Admiral Gehman outlined earlier. You've
even alluded to it in the book. Post-launch, it could be
argued, less formal, more decentralization, more delegation
certainly, okay, from what we see at the FRR prior to launch.
Multi-centers are involved prior to launch. I mean, they
all meet and they all sit at the same place, they're all
eyeball to eyeball. Center director is represented, program
managers. Post-launch, again decentralized, it's mostly
a JSC operation. Of course, KSC gets involved within the
landing at Kennedy.
There's a tyranny of analysis pre-launch maybe and that
is because you've got -- well, you have a long-term focus
because you've had time. But post-launch, there's a tyranny
of analysis but it's in real time because you don't have
as many hours and you've got to make decisions quicker and
all that other stuff.
The real question -- if this resonates with you at all --
could it be argued that during Columbia, NASA had a "Prove
there is no problem" prior to launch and post-launch it
was "Prove to me there is a problem" and we have this formal
and informal kind of focus. It seems to me after Challenger
we fixed the prior to launch, certainly with having people
appear in person and no VTCs or no over-the-phone. Everybody
had to be there in person. And we have maybe a problem that
we need to fix post-launch with the MMT and the decentralization
elements and maybe the delegation.
I certainly don't want to relegate it to a headquarters
level, but there are some things that need maybe to be fixed
there. So I would ask really your opinion that is there
some kind of a delineation in your mind, from what you know
to date, pre and post launch, that we might be able to provide
solid recommendations on to improve NASA?
DR. VAUGHAN: I'm wondering if the post-launch flexibility
is such that you can, in fact, have similar things going
on in two different parts of the process in which people
are not in touch. So I understand that video requests really
originated from two different points, working engineers
in two different locations, and that they didn't really
know that the other had originated a request.
It certainly seems that the mentality of proving your point
when you've got a time line like you do and it's an unprecedented
circumstance, as it was with Columbia, is wrong, of course,
in retrospect. The question you're asking is how can we
convert that into a process that prevents this from happening
again.
No, a famous sociologist once told me when I was beginning
the analysis of the Challenger launch, "It's all these numbers.
It's all these numbers, and there are these debates about
issues. Why don't you do it like they do it in the Air Force?
You just should have a red button for stop and a green button
for go." And there's a lot to be said for simplifying a
complex system, whether it's decentralized or centralized,
so that key people can respond quickly and shortcut the
hierarchy. I don't know if that begins to answer your question.
But there maybe need to be some more rules created in the
sense that --
GEN. BARRY: And this is really stretching it but
--
DR. VAUGHAN: Maybe it needs to be more formal than
it is and maybe it needs to be more like the pre-launch
procedure in terms of the rigor of numbers of people from
different parts who are looking at problems that crop up
while a mission is in process instead of waiting just --
I mean, some sort of a formalized procedure where there's
a constant ongoing analysis instead of you've got worried
engineers in two different locations who are kind of independently
running around, trying to get recognized and get attention
to the problem.
MR. WALLACE: NASA's taken quite a pounding here today
but I'm wondering what we can --
DR. VAUGHAN: I thought this morning they were coming
off pretty good.
MR. WALLACE: I would just like to talk about what
we can sort of learn about what they do well -- in other
words, areas where we don't seem to have this normalization
of deviance or success-based optimism. Like BSTRA balls
and the flow liner cracks and some of those fairly recent
examples where there were serious problems detected with
the equipment, in some cases detected because of extreme
diligence by individual inspectors and really very aggressively
and thoroughly fixed.
It seems to me that part of the problem of normalization
of deviance is sometimes the level of visibility that an
issue gets. How do you sort of the bridge that gap between
those things that get enough visibility or sense of urgency
and those that somehow seem to slip below that threshold?
DR. VAUGHAN: Someone said after the book was first
published -- and then again now I've been getting a lot
of E-mails. Someone said at the time the book was published,
"I bet if you took any component part of the shuttle and
traced it back, you would find this same thing going on."
Perhaps doing a backward tracing on other parts of the shuttle
could show you two things. First, what are the circumstances
in which they're able to locate an anomaly early and fix
it so they stop it in its tracks and avoid an incremental
descent into poor judgment? Are there other circumstances
in which the same thing is happening? Can you find circumstances
where you do have the normalization of deviance going on?
It's interesting in the history of the solid rocket booster
project that there was a point at which they stood down
for maybe two months to fix a problem. How is that problem
identified? What are its characteristics? I would bet that
the more uncertain, the more complex the part and the more
amorphous the indications, the more likely it is to project
into a normalization-of-deviance problem, given the existing
culture where flying with flaws is okay in the first place.
MR. WALLACE: Well, sort of following on. Earlier
you said -- and good advice for this board -- that we should
try to see problems as they saw them at the time and not
engage in the hindsight fallacy or whatever that's called.
I mean, I'm not sure you said this; but my assumption is
that that's almost the only way you can learn to do better
prospectively. I mean, do you have any other thoughts on
that? In other words, to see the problem as they saw them
at the time, to me, is almost a step toward the discipline
of seeing the next one coming.
DR. VAUGHAN: Right. It's an experimental technology
still; and every time they launch a vehicle, they've made
changes. So they're never launching the same one, even though
it bears the same name. This is a situation in which, like
most engineering concerns where you're working with complex
technologies, you're learning by mistake. So that's why
post-flight analysis is so important. You learn by the things
that go wrong. Every once in a while you're going to have
a bad mistake.
ADM. GEHMAN: Did I understand the point that you
made both in your book and in your presentation here is
that the answer to perhaps Mr. Wallace's question lies in
the theory of strong signals? In other words, if NASA gets
a strong signal, they act on it. No problem. They very aggressively
shut the program down and go fix it. The problem is in the
weak, routine, or mixed signals. Those are the ones that
seem to bite us. Of course, there are a lot of them; and
they don't quite resonate with the organization. Is that
a good analogy?
DR. VAUGHAN: It is. The idea of a trend analysis
is that it could pick out stronger signals from lesser ones
before it becomes, you know, an enormous problem; but the
recognition of the pattern is important, bringing forth
the pattern so that the people who are making decisions
are constantly in touch with the history of decisions that
they've gone through before.
I have to say with that, though, it's important that they
have quantitative evidence to fly. Maybe the more qualitative
evidence could be brought in in other ways further up the
chain, that whereas in Flight Readiness Review, for example,
they present everything on charts and they ask -- the purpose
of Flight Readiness Review is to clean the hardware and
get it ready to go. The purpose of it is to clear up the
problems as it works its way through the Flight Readiness
Review process. What happens, as I mentioned, is that the
engineering argument tends to get tighter and tighter because
they're constantly doing the work to investigate and respond
to questions and, in a sense, defend what they've said or
find out if there are flaws.
At the time of Challenger, I read thousands of engineering
documents for all the Flight Readiness Reviews that they
had had and I didn't see anyplace in the Flight Readiness
Review process that would allow for the presentation of
simply intuitions, hunches, and concerns, where qualitative
evidence might be presented, like a clear image or even
a vague image of a piece of debris the size of a Coke cooler,
for example, rather than charts for an engineering analysis,
you know, that there ought to be room in the process for
alarm.
ADM. GEHMAN: In your experience, particularly with
what I'm calling these weak signals or this muttering around
the room that the O-rings can't take freezing temperatures
but we're not really sure whether they can or cannot, I
have in my mind a model that says that it's unfair or not
reasonable to set as a standard for the organization to
act on literally hundreds of these misgivings that the tens
of thousands of people may have and that it's an unfair
standard to require the people who have these doubts to
prove that their doubt could cause the loss of the vehicle
or the crew. But I have in my mind that it's a more reasonable
standard that management should realize that the accumulation
of signals from the process are cutting into their safety
margins and that you can accumulate these things not in
a measurable way but in a subjective way, particularly in
a regime in which you have very thin safety margins to begin
with, that you should be able to reasonably determine that
you're narrowing your safety margins in a way that should
concern management. Is that a reasonable characterization
of the standard or the bar that we set here?
DR. VAUGHAN: I think that shows up in the problem
of lack of data in both of these circumstances, that there
were early warning signs and in neither case had those early
warning signs been pursued and say, "Well, the imagery is
bad. We know this is happening. We can't see exactly where
it's hitting. Why don't we get this now?"
I mean the power of the E-mail exchange was that they really
hadn't thought the possibility of failure through. There
was no plan for what needed to happen if there was, in fact,
a serious tile hit and damage to the wing, what would they
do at re-entry and what would it mean to attempt a wheels-up
landing at the landing site, and that failure to pursue
the trajectory of having a problem that's repeating. Like
if you think about cost maybe in terms of if that's a factor
in making issues a priority at NASA, which obviously it
is anyplace -- you can't fix everything -- think of the
cost if you simply don't have the data you need, which is,
I think, the most stunning thing about the comparison of
the two cases. At the time when conditions were highly uncertain,
in neither case did they have the data; and having that
background data is important.
ADM. GEHMAN: In your review of the Challenger decision,
did you personally come to the conclusion that the launch
decision would have come out differently if the Morton Thiokol
engineers' split decision -- because some of the Morton
Thiokol engineers said it was safe to launch, but they were
split on that -- and if the managers at Marshall had reported
that there was a split decision, that the FRR would have
come out differently? Did you have any evidence of that?
DR. VAUGHAN: The manager at Marshall did not know
that there was a disagreement at Thiokol. That was one of
the problems with them being in three locations. No one
ever thought to poll the delegation. So no one on the teleconference
knew really where anyone else stood. They knew what Thiokol's
final recommendation was and they assumed that Thiokol had
gone back and re-analyzed their data, seen the flaws in
it, and been convinced it was safe to fly. So the fact that
not every one was heard from was critically important.
By the same token, Thiokol engineers didn't understand that
they had support in the other places, that one of the NASA
managers who was at the Cape was really sitting there making
a list of people to call because he believed that the launch
was going to be stopped. So that was truly a problem.
Now I've lost sight of your question.
ADM. GEHMAN: The question was: In your research about
Marshall, did you come to the personal conclusion from talking
to people that the fact that the cold temperature analysis
at Morton Thiokol was a split decision, that that would
have made any difference at Marshall? I mean, did anybody
say, "If I had known that, I would have changed my mind"?
DR. VAUGHAN: Yes. However, the goal is for unanimity
and here's again where numbers count, that in the instance
where engineering opinion is divided, then they make what's
known as a management risk decision, that the managers take
over and the managers at Thiokol then, who knew that their
engineers were split, made a management decision. In retrospect,
that was the most horrendous example of failing to listen
to your technical people who said, "You know, I can't prove
it, but I know it's away from goodness in our data base."
ADM. GEHMAN: This principle that I'm following up
on here is important because we do have to be careful of
hindsight; and it may be that, even armed with what is admittedly
a minority opinion of a bad outcome, it could be that these
are judgment calls that are made in good faith with people
doing the best they can and they make a mistake. I mean,
they call it wrong. So the question is whether or not we
can indict the system, based on these incidents.
DR. VAUGHAN: I think you have to analyze -- you have
to do a social fault tree analysis and figure out what actually
happened and what went on, how is information relayed. I'm
sure that's work that's ongoing with you.
ADM. GEHMAN: That brings me to my next question --
and pardon me for monopolizing the time here. Another good
writer on this subject, who I think is Nancy Leveson, in
one of her models she suggested that we need to diagram
these decision-making systems because, just as you say,
it's not a person, it's a culture, it's an organization
that's really driving these things. Are you aware that anybody's
ever diagramed the FRR or the waiver, in-flight anomaly
disposition system? Has that ever been diagramed, to your
knowledge?
DR. VAUGHAN: Not that I know of. But what would be
more interesting would be to look at the more informal decision-making
processes because the rules are so strong for how the information
is addressed in Flight Readiness Review that that would
probably turn out the same every time. What you would want
to look at are the more informal processes and try to map
them and understand where the information stopped and why
it stopped.
MR. WALLACE: I'd like your thoughts on the concept
of whether an organization, this one, can sort of become
process-bound. You cannot fault the thoroughness of the
processes. But, I mean, is there a point at which they can
almost subvert other thinking processes, that people become
so confident in the thoroughness of the processes and the
fact that they're tested, they reach a comfort level with
processes where they become the be-all and end-all?
DR. VAUGHAN: Well, that's one of my main concerns
about NASA, that the fact that it is a very rule-guided
organization and the fact that they do believe that when
they follow all the rules that they have done their best
and have confidence. That's why the rules tend to carry
such heavy weight. Not only do they aid them with the process
but then they have a cultural effect which builds confidence.
If you're not in touch with the everyday engineering data
itself, you can lose sight of the fact that it is still
an experimental system. So it's the dark side of the organization.
The same kinds of procedures that you implement to make
it work better also can have an unanticipated consequence,
and that's why keeping in touch with all the ambiguities
in the engineering decision-making would be important.
Any other doubts and concerns? You know, by the time you
get to the top of the Flight Readiness Review process, nobody's
going to say that. One of the proposals from the presidential
commission was that an engineer accompany his project manager
at each level of the Flight Readiness Review tier, the feeling
that because engineering concerns did not get carried up
to the top prior to Challenger and in the eve-of-launch
teleconference, they thought that would be a good idea.
Rather than the engineers at Level 4 turning over all their
information to their project manager and then the project
manager carries it forward, let's integrate engineers into
the process. But can you imagine some engineer in the top
Level 1 Flight Readiness Review with 150 people, after all
that's gone on, standing up and saying, "I don't feel good
about this one"?
ADM. GEHMAN: Well, I agree with you. I agree with
you. But I would compound that with an organizational scheme
in which even though that engineer works in the engineering
department and technically doesn't work in the program office
but his position and his salary is funded by the program
office and he wouldn't exist if the program office didn't
pay him. In other words, we've wickered this thing to where
the money flows down through the projects and they send
money over to the engineering office to hire people. So
now put yourself in the position of this guy who's going
to contradict the officer who's paying his salary, and you
don't have a very comfortable formula.
DR. VAUGHAN: I understand that. I think there's a
parallel situation with safety people.
ADM. GEHMAN: Well, yes and no. There is a safety
organization in the programs and in the projects and their
positions depend upon the largesse of the project managers,
but there's also an independent safety organization.
DR. VAUGHAN: I meant in terms of rank. Like independent
authority and power based by where they come in the GS ranking
system.
ADM. GEHMAN: Absolutely. That's a question I'm going
to ask you after General Barry and Dr. Logsdon have a chance.
DR. LOGSDON: I have a comment that's as much directed
at the board as it is at Professor Vaughan. It's just that
this discussion made me think of this line of reasoning.
We've been talking about the rigor of the pre-flight process
for readiness review, compared to a different structure
for what goes on during a mission. There's almost a symbolic
element here. The management of the launch is a Kennedy
Space Center responsibility; and the moment that the shuttle
clears the launch tower, the control over the mission shifts
to Johnson. Sean O'Keefe is trying to say that NASA is a
single organization, but he's got a long way to go to achieve
that goal. These are very proud organizations and, of those,
Johnson is the very proudest of the proud because it's one
of the only two places in the world that knows how to manage
a space flight. There are now -- what's it, '61 -- so 42
years of experience of managing humans in space.
So we're beginning to talk about maybe we can examine the
process of mission management and see whether it measures
up to some standard of high-performance organizations, and
I think that's what we have to do. But there's a lot of
received wisdom and maybe it's ossified wisdom by this point
in the process. So as we go towards that, I think we have
to make sure that we don't have unintended consequences.
So, I say, that's just a comment, not a question.
ADM. GEHMAN: Would you like to comment on his comment?
DR. VAUGHAN: Well, he directed that to the board,
as well.
ADM. GEHMAN: In the interest of time, I'll go on
to General Barry.
GEN. BARRY: I'd just like to add one more thing to
your parallel kind of discussion between Challenger and
Columbia. Could you just see if there's anything you know
of that you could add to this kind of construct? You know,
there was a lot of organizational changes here in the last
couple of years. We moved Palmdale to Kennedy. We moved
the Huntington Beach engineering support mostly to JSC but
some to KSC. And, of course, we've got the International
Space Station support going on. So there's some organizational
elements that are unique to Columbia this time; but there
are some Challenger organizational elements, too. You know,
the JSC leadership was being shared by Jesse Moore at the
time between JSC but he was also running the space flight
program as associate administrator. Also, we had an interim
administrator at the time during Challenger. Are there any
parallels that you're seeing between the organizational
aspects between Columbia and Challenger?
DR. VAUGHAN: At the administrative level?
GEN. BARRY: Well, just organizational elements that
we might be able to draw from.
DR. VAUGHAN: One, but it's cultural. It seems like
there is a gap between perceptions of risk between working
engineers and top administrators. So at the time of Challenger,
engineers were very concerned with every launch, even though
they had gone through all the rigors of the procedure; but
at the same time, the people at the top thought it was an
operational system. The parallel I see is, you know, working
engineers really familiar with what's going on and having
concerns, but decisions made that really do echo the period
of Challenger where it's okay to take citizens along for
a ride, which suggests that top-level administrators have
rather lost touch with the fact that it is an experimental
system, a message that they clearly understood post Challenger.
John mentioned symbolic meanings, and they can be really
important. It's hard to judge exactly what the effect is
of a top administrator believing that it's again safe enough
to fly people who are not trained as astronauts. Subtle
things like "faster, cheaper, better" can have an effect
on a culture, even at the same time that you're doing everything
possible to encourage safety.
Certain actions have symbolic meaning. The fact that you
have a safety representative sitting in on a Mission Management
Team or in a particular wherever they're assigned can have
symbolic meaning. Signs posted that it's safety, safety,
safety can convince that you have a safety culture; and
yet when you look at the way the organization works, you
may not have as strong a safety culture as you wished. The
safety person who is assigned to Mission Management Team
decisions, if that is the case, is in a position of not
having hands-on information and reviewing their decision
but not, in a sense, dependent upon them because they have
the leadership responsibility. So what kind of weight, you
would want to know, is that person really bringing to that
situation? Do they have the influence that they are listened
to? Do they have the data to really do anything more than
oversight at that point? How do you really put them in a
position where they can recognize a warning sign and talk
with people who are higher ranked than they are, in a definitive
way, that is convincing in a crisis situation?
ADM. GEHMAN: That leads to my question. That is,
would you be content -- let me just outline this in rough
form -- of a process to satisfy that issue. That is, that
senior management, the management who's got the ultimate
responsibility in these decisions, that they would kind
of be forced to listen to these engineering doubts because
of an organization in which you had checks and balances
among essentially coequal branches of some kind. In other
words, that the engineers were organizationally and culturally
equal to the project managers and the safety and mission
assurance people were not only -- I agree with you. I understand
exactly what you're saying. It's not good enough to just
sit at the table. You have to come to the table with some
clout and usually that clout's in the form of analysis or
data or research or else I won't sign your chit for your
money or something like that. You've got to come with something.
And my model suggests that if you did that, you would be
creating some degree of managerial chaos but, on the other
hand, you would be making sure that engineering reservations
and engineering concerns were well researched and got surfaced
independently at the right level. So you've kind of got
this trade-off between a little bit of managerial chaos,
you would have the danger of the organization not speaking
with one voice and all those kinds of things but, on the
other hand, you would satisfy the requirement that signals
would get heard.
DR. VAUGHAN: Surfaced.
ADM. GEHMAN: Does that sound reasonable?
DR. VAUGHAN: It does sound reasonable. Someone said
if every engineer had every concern, you would probably
never launch a mission; and that's probably true.
ADM. GEHMAN: Probably true.
DR. VAUGHAN: It seems in post-launch conditions where
the clock is ticking, in line with Dr. Barry's suggestion
about how could we restructure the post-launch decision
process, that it would be especially important, then, to
create that kind of an open process.
ADM. GEHMAN: Okay. Well, thank you very much, Dr.
Vaughan. You've been very patient with us. We hope we haven't
tried your patience too much as we try to understand the
very sound principles that you have exposed us to, both
in your book and in your briefing here today.
The board is sensitive about the law of unintended consequences,
and we want to be very careful that we understand more about
these managerial principles before we go writing something
down on a piece of paper that we might regret. But your
study has had an influence on this board and we're indebted
to you for coming and helping us with here today.
DR. VAUGHAN: Thank you. Thanks for having me.
(Hearing concluded at 4:38 p.m.)
Back to April 23 Transcripts |