Columbia Accident Investigation Board
Home Board Charter Board Members News Photos Events Contact The CAIB



1:00 p.m.
Hilton Hotel
3000 NASA Road 1
Houston, Texas


Admiral Hal Gehman
Rear Admiral Stephen Turcotte
Major General John Barry
Major General Ken Hess
Dr. John Logsdon
Dr. Sheila Widnall
Mr. G. Scott Hubbard
Mr. Steven Wallace


Dr. Jean Gebman
Mr. Robert P. Ernst
Dr. Diane Vaughan

ADM. GEHMAN: Good afternoon. The afternoon session of the Columbia Accident Investigation Board public hearing is in session. This afternoon we're going to hear from two experts on the subject of aircraft aging, which is another risk element in the shuttle program which wasn't originally foreseen -- at least I don't think it was. The shuttles were originally designed to last ten years and now we're passing 20 and headed toward 30 and the shuttle vehicle then is facing issues which need to be looked at to determine whether or not the shuttle can operate safely. We're very pleased to have you two gentlemen join us.

Dr. Jean Gebman is a senior engineer at the Rand Corporation; and Mr. Robert Ernst is the head of the Aging Aircraft Program at the Naval Air Systems Command, Patuxent River. We're glad to have you both with us.

I would invite you to introduce yourselves and say a little bit about your present job and your background; and then if you have an opening statement or a presentation, please go ahead and proceed. Why don't you both introduce yourselves first, and then we'll go ahead with the presentation.

JEAN GEBMAN and ROBERT ERNST testified as follows:

DR. GEBMAN: I'm Jean Gebman, senior engineer at Rand, working on the aging Aircraft Project. My educational background is in aerospace. My doctoral work majored in structural dynamics with minors in fluids and control engineering.

MR. ERNST: I'm Bob Ernst, the head of the Nav Air Aging Aircraft Program and also representing the Joint Council on Aging Aircraft which is a DOD, FAA, NASA, and industry consortium trying to work on age issues. I don't have the storied credentials and degrees that my counterpart has, but I've got a lot of years of experience working on old platforms and rust and corrosion and obsolescence and those sorts of things.

ADM. GEHMAN: Thank you very much. Go ahead and proceed.

DR. GEBMAN: Thank you, Mr. Chair. Bob and I are going to present two briefings that are very complementary. I'm going to talk about some technical details to give you a somewhat hurried landscape technically, and then Bob's presentation is going to deal with some of the cultural and programmatic matters.

Next chart, please. This is simply a bit of background. In the interest of time, we'll just press on ahead. Next chart, please.

The examples that I've selected do have a methodology behind them, and this chart is an attempt to try to capture the essence of that. We're going to focus on the top set of items, although aging aircraft do involve all of the functional areas that are listed on the left-hand side of the chart.

Next chart, please. So this is going to be the focus.

Next chart. Whether or not this focus proves helpful to you is, of course, a matter to be determined as your investigation moves forward. So my purpose here today is more to share with you some areas where the aging aircraft experience might prove helpful as you move down the road.

Next chart, please. You all have seen the various diagrams of the shuttle. I'm going to focus on the left side.

Next chart. And simply make a couple of points. We have four main spars that go through; and when we talk about structures and structural dynamics, one of the things we often quickly look at is the wing route where the spars go through. That's just simply one area that one is always interested in.

Next chart. Another area that's of interest and will be touched on by one of my examples subsequently has to do with the aluminum honeycomb. This is simply a cross-section showing at the top there the interior face sheet, which is aluminum; the corrugation, which is aluminum; and the piece of bond between the corrugation and the exterior face sheet; and then, of course, the thermal protection system underneath. A very sophisticated system. And one of the things we will be talking about later is the matter of adhesion as a method of joining structural materials together.

Next chart, please. This is a list of the samplers. Let's get right to it.

Next chart. B-52 is a very interesting story. This often is pointed to as here is why it is possible to maintain a fleet for a very long period of time. We need, though, to be cautious and acknowledge how it was we got to that situation, because you may note that the G model and the D model have long since gone to the boneyard. Corrosion was the principal culprit. The basing at Guam was about the worst base you could be at for an Air Force aircraft from a corrosion standpoint.

Next chart. Even the H model, to get it to where it is today, has been significantly rebuilt in many areas, as these various shaded areas demonstrate. Moreover, it has been based at a location that is relatively benign from a corrosion hazard standpoint and the maintenance people learned a good lesson from the experience of the G model and there has literally been a zero tolerance for corrosion. If they see corrosion, it must be removed.

When we visited the depo about six years ago, we looked B-52 and the KC-135s. I was challenging the technicians on the B-52, "Show me the corrosion."

They said, "Dr. Gebman, there is none."

I said, "Folks, it's an old airplane. We know there must be corrosion."

Finally, they were able to show me a detail at the back of the airplane and they acknowledged, well, we ground out a little bit back here but this is not even significant.

This airplane is very different from the 135. Next chart, please.

ADM. GEHMAN: Could I ask you to go back a second. In that first bullet, what is a full-scale fatigue test, what's a damage tolerance analysis, and what's a tear-down inspection?

THE WITNESS: The full-scale fatigue test is where you take an article that could be flown in flight and, instead of doing that, you set it up to be loaded cyclically by attaching various jacks and an enormous hydraulic contraption and typically you will try to simulate two -- in the old days, four -- equivalent lifetimes that identify where the fatigue vulnerabilities are so that they can be addressed during production and/or during maintenance.

ADM. GEHMAN: And I assume also recognize -- I mean, in other words if you have a fatigue indicator like a crack or something like that, the idea is that you would then be able to recognize that if that were to happen in a service vehicle.

DR. GEBMAN: One of the most important things you learn from the test is where the cracks are taking place and so that you can set up a maintenance program or do a modification so you don't have to set up a maintenance program. The damage tolerance analysis is a method of studying the growth of fatigue cracks and their significance, giving you further information that you use for fleet management and modification purposes.

The tear-down inspection took place in the 1990s, largely to identify places with corrosion was going on in areas that could not otherwise be seen. When we do heavy maintenance, we don't take the airplane totally apart. The notion of a tear-down inspection is to take a high-time airplane which you're prepared to sacrifice and literally take every part, open it up, and see where you have challenges.

MR. WALLACE: Is the concept of damage tolerance that you will be able to detect cracks and things and also make predictions as to their growth rates in such a way that you can easily detect them before they become critical?

DR. GEBMAN: Yes. And I would encourage, if I might, that we try to speed through the examples because you will have an opportunity to see illustrations of some of these specific points.

With the board's permission. Next chart, please. Moving on to the 135, corrosion is the principal challenge with that fleet.

Next chart. This is an example of a tear-down inspection. What you're looking at is a drawing of the top view of the full fuselage. Each square is an area that they took the structure apart, opened it up, looked at it sometimes under a microscope. If you see color in the square, it means they found at least light corrosion present. Just about every square that they did a detailed examination of, they found some indications of corrosion with that fleet. That is a result of the materials that were selected, the environment in which it is operated, and the maintenance program which it had through its lifetime.

Next chart, please. Similar view. This time it's the wing structure.

Next chart, please. As a consequence, over time when these airplanes go in for heavy maintenance now on a five-year cycle, it can take a year to do the complete job.

Next chart, please. This chart shows declining labor hours required. We are now at a point where the labor hours to do that heavy work is eight times what it was the first time it was done when the airplane was about eight years old.

Next chart, please. Until very recently it was the Air Force's intent to keep all KC-135s to the year 2040 or thereabouts, at which point the fleet would be 80 years of age. Recently the senior leadership has decided that the older airplanes, the E models of which there are somewhat more than 100, need to be retired sooner than that; and they are now looking at leasing perhaps a 767 to fill this particular function. So one's perspective about life can change significantly as you learn more and more about the growing burdens before you.

Next chart, please. Moving on now to a new decade. Next chart. I share this example with you that illustrates some of the complexities and depth and breadth of endeavor one can get into when dealing with life issues. Now, the irony is that this is dealing with the new C-5A in the early Seventies. It had a very unfortunate experience in its full-scale fatigue tests. Fatigue cracks throughout the airplane, especially in the area of the wing.

The Air Force Scientific Advisory Panel convened a study in 1970 for the Air Force, made some recommendations. The following year, a major engineering effort was launch. Independent review team. One hundred people worked for one year, going through the results of the full scale fatigue tests, looking at the different options that the Air Force might consider, analyzing Options A through H, and presenting them to the leadership. Ultimately Option H, wing redesign and replacement, was selected. Once you open up the area of structures, the number of things that you can end up having to examine can be considerable. That's the lesson from this particular example.

Next chart, please. This example is a little bit different. We're focusing on a specific technical issue. It's honeycomb composite material, and it proved, in those few areas where it's used on the F-15, to be quite challenging.

Next chart. These are some of the methods in which the water and the corrosion and cracking and durability issues arose with that particular fleet. To the extent that this proves of interest, the area of honeycomb composites, this particular fleet -- and there are some other examples -- might be worth looking at.

GEN. BARRY: One comment on that. This is also the leading edge of a lot of the wing forms in the F-15s, particularly in the tail as a point. So might be of interest in the board.

DR. GEBMAN: Yes, sir. Thank you.

Next chart. Moving on to the Seventies, here we have two examples dealing with the loads that actually occurred, exceeding what the designers thought they would be.

Next chart. This is a classic. The F-16 was designed for both air-to-air and air-to-ground work; and it turned out that in the air-to-ground mission area, the loads that the structure encountered very quickly exceeded the capacity of the structure as it was designed. This illustrates the importance of really monitoring your loads through your life cycle so that you take that load information and update your expectations as regards fatigue cracking.

Next chart, please. This is the process. This is the durability and damage tolerance analysis process and I'm certainly not going to lecture on this today, but this is a summary that you might find useful as your work moves forward. When I look at this, I look at it from not only a structures viewpoint but also from a systems viewpoint. You can literally go through that chart and change its orientation from fatigue, which it was designed for, to corrosion or other kinds of things that affect an aircraft as it ages. Indeed, today people are working on the development of what's called a functional integrity program approach, which mirrors this aircraft structural integrity kind of program.

Next chart, please. The B-1 example is a little bit different. Here we were dealing with acoustic fatigue, which is a dynamic phenomenon and it's a bit like the tuning fork. If you hit the tuning fork, it will vibrate at a natural frequency. Well, aircraft structures, if excited at their natural frequency, will engage in vibration; and this can great accelerate the propagation of fatigue cracks. That's the essence of that particular story. It's an interesting one from you all's perspective to an extent because it involved both thermal, aerodynamic, and structural dynamics. It turns out that the designers deliberately had hot exhaust from the engines going over the control systems at low-speed flight to increase the control authority of the control surfaces.

Next chart, please. Now for our final example. Next chart. This is an airplane that served quite long in terms of landings. It was designed for 75,000, and in flight hours it was not all that high. It was designed actually for 50,000. This example illustrates the three things listed on the chart.

Next chart, please. Imagine yourself flying over the Pacific in this particular airplane. You're in Row No. 5. You have the seat next to the window, and over your left-hand shoulder there's a fatigue crack. From the NTSB's excellent work, it appears that the sequence we're going to talk about started at the fastener hole indicated here. What's important to focus on here is the length of the fatigue crack. The blue is supposed to depict the sky. From the outside of the airplane that crack was only a tenth of an inch long, and yet it contributed to a sequence of events that we're going to look through in the subsequent charts.

Next chart, please. Part of the problem is that it wasn't just one crack at that fastener. There was one on the opposite side, as well. It was only .11 inches. So from a detection standpoint, this would have been a bit of a challenge to be detected visually just from a casual walk-around kind of inspection. From a fracture mechanics standpoint, though, the crack is really a half inch long because when you look at the stress intensity at the tip of the crack, what it depends upon is that total length, that .53 inches. And fatigue cracks, we now know, grow at a rate that is a function of how long they are. So the longer the crack, the more rabidly it will grow as that part of the structure goes through its next cycle of loading up and down.

Next chart, please. Not only was Fastener Hole 5 cracked on both sides but there were also adjoining fastener holes numbered 3 through 9 that also had these kinds of cracks.

Next chart, please. Consequently, Fastener Holes 3 through 9 simply zipped across one afternoon when the loads hit a particular level; and this particular sheet of middle separated from its counterpart.

Next chart, please. The problem is -- and I must apologize, this chart didn't quite make the translation from Macintosh to PC the way I had hoped -- this chart is intended to illustrate two pieces of skin with an adhesive material between the skins. You see, the fasteners were never designed to carry the load. The load was supposed to be carried by the adhesive. The adhesive broke down. There was corrosion that took place. So we have a combination of adhesion failure and corrosion going on, destroying the primary joining mechanism. The fasteners picked up the load, but cracks developed very quickly because they really weren't intended to carry the load for very long.

Next chart, please. The failure next was supposed to be stopped by what's called a fail-safe strap. These are spaced every couple of feet; but it also was glued, if you will, to this skin. The glue had eroded over time. Corrosion was taking place. So when the came load came zipping down to the fail-safe strap, it too broke.

Next chart, please. Indeed, all of the fail-safe straps broke between the two major bulkheads that define the boundaries of this particular failure. Fortunately, there was only one fatality, although there were a number of other injuries. The silver lining to this particular cloud is it caught the attention of the aerospace community, and since then there have been a whole series of efforts that really were stimulated by this and some subsequent events.

Next chart, please. One of the matters you all will be talking about later, I think, might be somewhat related to this chart. This was not a matter that was brand new in 1988. The first signs of it were back in 1970, and the bullets in this chart sort of trace some of that history.

Next chart, please. So in closing, two more charts. Next chart. In looking back at the life cycle management of fleets over time, there are some things that seem to serve us well, and they're highlighted here. We talked about the durability and damage tolerance analysis, the full-scale fatigue tests, tear-down inspections, updating the damage tolerance analysis with new loads data because loading environments change over time with flight vehicles, and maintaining high levels of system integrity.

Next chart, please. In closing, many fleets have flown way beyond the traditional points of retirement. In studying these flights, each seems to have its own unique story in terms of the challenges it had and how those challenges were dealt with. We hope, we at Rand on the Aging Aircraft Team, that this quick survey of the landscape may prove of some aid to the board as you continue your important work.

Thank you.

ADM. GEHMAN: Thank you very much.

MR. ERNST: I'm hoping to see a slide here in a minute that comes up.

I want to thank you for the opportunity to talk to you a bit more about the cultural issues. Dr. Gebman and I compared slides for the first time about two hours ago, and you'll see some tie-ins to his slides that is more by coincidence in our mutual experience by preplanned coordination.

One of the things I want to focus on is cultural, and it goes back to part of the problems that I saw in Dr. McDonald's Shuttle Independent Assessment Team back in 1999 and some changes that I think need to be made in the aerospace industry.

Next slide, please. I also want to offer the apologies of Colonel Mike Carpenter, my counterpart in the Air Force Aging Aircraft Program, who was still stuck at Wright Patterson. You'll see these slides we kind of do interchangeably on here. This one's a little dated, but it shows the growth of the age, the average age of our fleets over the last 10 or 12 years, most of it from the DOD side from a procurement holiday. When you're talking about an aircraft reaching 20 years of age, that's an average age. You've got some like the B-52 and the KC-135, H-46, they're getting up in the late 30s.

We are in unprecedented areas in dealing with aging aircraft. It's not like we can go back and find the predecessor of the B-52 and see how it did in its forty-fifth year. There isn't that data. As you can see from Dr. Gebman's presentation, there are a lot of complex issues. I use the phrase, "This isn't rocket science," but it really is a complex issue, an age type of rocket science in there. Even though we have a lot of very, very talented individuals working on these issues, we're kind of a 1-of-1 type of scenario. We're out in new areas in there.

I also want to show that the systems, even that are old, it doesn't mean they can't be effective. I think all we have to do is look to the recent aircraft performance in Operation Iraqi Freedom to see that our legacy platforms, when they're put in the hands of qualified operators and maintainers that are dedicated to their jobs, can do a tremendous job and do a great performance. But sometimes those aircraft, when they get up in age, we have new issues that we have to handle in there.

The challenge we need to do is balance when can we recapitalize. There's no idiot light that just sits here and goes, ding, "Replace this aircraft and buy new aircraft." We have to look at a variety of factor, things such as fatigue tests, tear-down inspections, load surveys, complex issues. And frankly, they aren't very sexy. When you talk about I want you to go study corrosion and rust propagation in aircraft, that's not the thing that the young kid out of school necessarily wants to focus on. So there's some challenges there.

Next slide, please. One of my other hats that I put on to cover my bald head is part of the Joint Council on Aging Aircraft. I wanted to explain a little bit about this. This was a grassroots group that got together a little less than two years ago because we all realized in the Air Force and the Navy and the Army and Coast Guard and DLA and NASA that we did not have enough resources. You can read resources as people, money, and time to be able to handle all the issues adequately but we said, you know, we're taxpayers and every April 15th I look at my tax statement and say, gee, I'd like to see if I can reduce that tax burden somehow. So we decided to cooperate and graduate and see if we could share things together and work together on certain issues in here. This group met in about August of 2001, the Joint Aeronautical Commanders group said, "Hey, what are you doing on aging? Let's get together and formally charter this group."

Next slide, please. So if you know anything about the Joint Aeronautical Commanders Group, the service three stars, at the systems command they report to the Joint Logistics Commanders group in there. They have a series of boards, and we were adopted by them and became one of their boards.

Click it again for me and bring up my next pretty picture. There's the people we have from the leadership of the different aging aircraft communities. And we are a board and what we're trying to do now is bringing the attention of aging aircraft issues up to the other members of the board and to try to get things changed.

For example, training. We went around and we found out that sometimes our maintenance training wasn't up to snuff in some areas. So we went back and said, "Hey, how does that training curriculum that was done when the S-3 that Admiral Turcotte flew was delivered in 1974, how should that change?" And we went through and looked at seeing some of those things because aging is going to change some of your core functions and logistics and engineering and supply support and those issues and our job is to bring focus to those.

Click it again for me, please. Next slide, please. What is the mission of the JCA? Twofold really. One is to identify and investigate issues. But we're not just a think tank. We're not going to put a pretty little report that says you really need to go, you know, build this or you need to do this. We're also serving as program managers that are fielding products, especially in the transition area, taking a lot of the new technologies that are out there and look really good, putting them on aircraft and making sure what application they work. That's our focus. And that's one of the biggest pitfalls we have on an aging side is taking all that really neat stuff out there, all those science fair projects, and putting them on platforms.

Next slide, please. Ironically, I sat in the airplane late last night and said what are some of the characteristics of a robust, good successful program; and you'll see a lot of similarities to what Dr. Gebman presented. The first thing we have to do is understand how all of the components, whether it be an O-ring, a structure, an ejection seat in a fighter aircraft, whatever you need, how does that age. If you look at the way we classically develop air vehicles, we spend a lot of time focusing on the development side, getting it up to initial operational capability, and then we've qualified all those issues, they're good, we just kind of do some monitoring of our data but we really don't know all the interdependencies of all those different materials and how they age as a function of time, how they age as a function of changes in environmental regulations, how the load changes, the pilots are going to fly the airplanes differently. We have mission changes on there and we now want to be able to do this or do this or drop this bomb. You can look at all the views of the airplane over time and see the mission changes. So we have to understand how each of those subsystems are effective in the system of systems.

The next thing is monitoring our fleet usage data. You give a pilot an aircraft, and he's going to find unique ways to be able to fly that airplane in an environment, especially with new mission growths that we've got to counter. The way you do a fatigue test is you go and you estimate how many 1G, 1 1/2, 2G maneuvers, how many landings, how many takeoffs, how many pressurization cycles, and you put it all in there and you literally, you know, bend this thing like it's a piece of silly putty to see where it cracks going in. But you're guessing how that airplane is going to be used 20 and 25 years in advance. And one of the changes that we've seen is we need to go and monitor that fleet usage, collect that data, and then update that fatigue testing because, you know, I guarantee things are going to be different ten years from now, just as they were ten years ago.

You need to utilize that fleet data to go back and not just collect it in some big data morgue but go back and say: How's your original calculations? Are you using up your service life earlier? You know, the Navy went and bought some F-16s for their adversary squadrons, and we used them up in about four years because they were all doing the shooting down their watch type of stuff very quickly in there. The mission changes, the requirements change, and we have to be able to make sure our original predictions -- they're not wrong, but they've got to be validated. It's kind of like me taking my two thumbs and going like this and saying, yeah, I can figure out and calculate how I'm going to go to the moon. You've got a lot of mid-course corrections you have to do.

The last issue which was brought up before, I found it amusing to hear the previous panel talk about the daily report systems in PRACA. We need to collect good data, but we need to have that data resident at the subject matter expert's fingertips, not in some type of huge data base in the sky that nobody can get to. And all those elements need to be in there. It's more than just neat technology. You have to have all these elements and, folks, this ain't sexy but this is the core that allows you to manage a fleet effectively.

Next slide, please. The Joint Council on Aging Aircraft, working together, try to run their own programs and share this data together, is trying to make process recommendations and not just field issues. Microcircuit obsolescence was brought up today. What data do we need to buy in our acquisition programs to make sure that we can support the rapid changeover in technology, because we're not going to drive it in Department of Defense or NASA anymore. When we have to get with the industry and figure out what data we need, what's the best approach, that's going to require some acquisition changes, some process changes -- again, not just technology -- but yet we will take those technologies, evaluate them, and say these are the ones we need to select.

I once told a group that I was walking along the beach and picked up a pretty seashell and out ran three guys selling corrosion solutions. I mean, there literally are hundreds of technologies; and I think I broke my corrosion lead's pencil when he got up to about 84 different areas. I said let's get six out there and be successful. We like good ideas. That's what fuels the reduction of our problems with aging aircraft, but we need to also make sure that we are pushing not all of them but we are pushing the top couple of them.

We are facilitating the transition, making sure that we are prototyping them on the aircraft. We do not fly what we have not tested; and I can show you story after story after story when it approached that test, something else happened, either we had a sealant or we had a compound, or wash cleaning fluid that interacted and we need to be able to evaluate those issues.

Of course, we're promoting knowledge management. What is the cost of aging? Where is that big idiot light that says: "Buy more F-18EFs and retire S-3s for tankers"? Where is that point that we can make the right economical decision? And there's a paucity of data on those issues and it's kind of like everybody has their own way of calculating it and we're working with Rand, trying to get all those groups together.

So we're working together on a variety of issues from process to technology to acquisition to knowledge management type of solutions.

Next slide, please. That's what I do on my part-time job.

We've been tasked by the Aeronautical Commanders Group to try to foster a national strategy, working DOD, NASA, FAA, and industry. What do we need to do? A lot of our effort, about 80 percent of our time, is on what I call tactical initiatives, what is the best way of inspecting wire, what is the best corrosion compound, yada, yada. About 15 percent of our time or more is strategic areas. What do we need to do to handle diminishing manufacturing sources and obsolescence? About 5 percent of our time is on things like what is the right amount of sustaining engineering that we need to have on our team. How much emphasis do we need to have on our data systems? What data do we need to collect?

We just recently partnered with NDIA and AIA, two industry consortiums, so that we can get feedback from industry, because I'm not going to say that I'm clairvoyant and have all the answers. I've made enough mistakes, I have nine lives based on my mistakes, but I want to get from industry that partnership of where do they think we need to change. Do we need to change our process for buying, for supporting? What amount of balance is there in the government and industry team?

Next slide. You purposely can't read this. I don't want anybody to read this because it's an early version. But we've actually gone to doing road maps where we've surveyed -- and this is from wiring -- from both a technology point of view, an acquisition, a logistics, a training, all those areas, all the different programs that are out there. When you see those pretty little red things, well, green is good, yellow is ehhh, and red is real bad. You see where we need to build a strategy, and we're trying to make sure that all of our funding and resources, they're not joint but they're at least lined up and all pointing in the same direction and we're pulling in the same way.

Next slide. What are some of the successful models of teams that we've stood up. Too often we have a hearing like this and we go in there and Congress passes a new law and we anoint a new person to be the czar of something and he comes out, or she, and puts out lots of mandates. And maybe I'm a cynic -- well, I know I'm a cynic -- maybe I'll admit it -- but that doesn't always work.

One example I want to point out is what we did with the JCAA corrosion steering group. The reason it was successful is we took the materials experts in each of the sites and married them up with the program teams, put in logistics people for publications and training, a cross-functional IDT, and said, "You guys tell us what to do." My role now becomes less of a messenger and more of a barrier-removal expert. At least that's what I call myself. They call me something, other things, but we can't say those in public. So we need to build those from the bottoms up and not just create something from the top down that puts more unfunded mandates on us.

Next slide, please. Summary. I think our aging aircraft problem's a serious threat. I think it's something that requires an infusion of resources, an infusion of capital, and a national strategy to be done. At the Joint Council on Aging Aircraft, we're trying to coordinate those different areas. You can come back and judge whether we're successful or not. I think the industry cooperation is critical. We're not going to say that this is a government-only issue, but we're listening from the best practices. I will steal from anybody and any group and, as Winston Churchill said, he would even say a kind word for the devil in the House of Commons if he would help him against the Nazis. I'll even partner with the devil if he'll help us with our aging aircraft strategies, and I think we need a strategic process that requires that collaboration. And the last time I checked, we need NASA's involvement in there. Their involvement's increasing, but we need to remind NASA that one of those A's stands for aeronautics and we need them and their expertise.

ADM. GEHMAN: Thank you very much.

MR. WALLACE: I think the focus has been mostly on structures, although Mr. Ernst did talk about avionics and wiring. I know that in the civil sector where I came from, after Aloha we launched, of course, a very extensive aging airplane program. I feel like the structural part, at least perhaps in the less challenging field of civil aircraft operations, is reasonably well handled or at least that we currently feel that the aging systems challenge is greater -- and wiring in particular.

I wondered if you have any sort of conceptual thoughts on aging systems, wiring, and whether or not there's a different approach. You talked about the need for accurate reporting and that sort of thing. But in many respects those seem to be some of the more difficult challenges.

MR. ERNST: You could pick any subsystem that you want and the process that was set in place -- from analysis, technology, investments, prototyping, data collection -- that Dr. Gebman showed, needs to be followed through. And I believe that the FAA's wiring non-structural program follows some of those classic issues. In having been part of it and actually teaming with the FAA on some of those areas in wiring, you can see that it follows the same type of elements in there.

Wiring is a major issue. We made some mistakes when we selected the wire types in some of our vehicles in the Eighties. We did some qualification testing on it, and it had some very adverse characteristics. I'm trying to be nice. We now need to make sure that we're developing things that are not just saying, yeah, throw that one away, build all new aircraft, but can inspect it to make sure the bad characteristics, i.e., the tracking that was associated with aromatic polyimide insulation is not prevalent. But all those elements require smart people work together and the success story is -- I'm not sure you aware of this, but the FAA has spent a fair amount of money really investigating the different types of inspection technologies, whether it be frequency domain reflectometry, time domain reflectometry, scanning wave ratio, and a whole bunch of things that make my brain hurt. And the Navy is actually doing some of the transition and manufacturing of those systems and buying and fielding them initially in our depots and our organization of the troops. The Air Force is doing the same thing. We're working together on these issues and eventually we're going to get products that the commercial industry can take back in on. So you see the FAA do the early R&D, the Navy and the Air Force do some of the tech transition of prototyping and measuring and quantifying what percentage of wire chafing is now degraded that you have to replace -- what are those red, yellow, green thresholds -- and then the commercial aircraft industry can pick up and procure those things without having to develop all those issues. The process is pretty much the same, but we need to make sure we have a robust in all those issues. Wiring is in pretty good shape. Corrosion in structures is in pretty good shape. If you want to talk about helicopters and all those rotating machinery, it's a pocket of poverty.

MR. WALLACE: Well, following up on one of your points about the type of detailed inspections required, I mean, can you speak to the issue which I know was very much discussed sort of in the post-Aloha inspection implementations of just sort of numbingly monotonous maintenance tasks and the human factors associated with that?

MR. ERNST: I like the choice of words. One thing that when I got a chance to sit inside or look at the internal bay, cargo bay of the Columbia in '99 was at Palmdale and there were wiring issues, the primary method of inspection of wiring was the Mark 1 motto, eyeball in a mirror. And I sat there with the Air Force wire technologist on a team, George Zelinski, a very detailed, knowledgeable individual, and I tried to see if I could find those myself because I'm an engineer. I've been around wiring enough times. I couldn't see those issues that they were required to pick up. And we had a system then that was mind-numbing, that required a lot of expertise and experience and there's technology out there that can do that better and, more importantly, can do that as a precursor to failure. You don't have to wait until you see insulation to say, yes, it's through. What we need to get to is a prognostic system where we can check non-intrusively, not pulling bundles apart, but we can check those wiring bundles and say I'm starting to get some breakdown whether it be due to hydrolysis, whether it be to chafe, vibration, wear, gremlins, whatever, and say now I've got 80 percent through. At 20 percent I now ought to go on a scheduled maintenance procedure and put that together. And that's where we need to go and that's part of a holistic wiring strategy that I believe we have right now. We just have to get it funded and implemented.

MR. HUBBARD: I have a question for Mr. Ernst. You made a passing reference to NASA's PRACA problem-reporting system. Could you characterize for us what you think are the best characteristics of the kind of accurate problem-reporting system you referred to in your slides?

MR. ERNST: A system has to be realtime. It cannot be a system that takes 18 months to collect data. It's got to be something that is easy for the operator or maintainer to input. The Navy system, years ago, was a paper system where the poor guy, after working a lot of hours fixing the aircraft, would fill out the paperwork and, because of that, there were inaccuracies once in a while -- not in Admiral Turcotte's squadron, of course -- but there were inaccuracies that every once in a while we went back and looked at those things.

ADM. TURCOTTE: We go back.

MR. WALLACE: Are you trying to sell him something?

MR. ERNST: I could tell stories, but I won't.

It has to be a system that is easy, simple, robust, and it has to be something that tells you something about the failure, not bug-in-the-cockpit type of issue and then say, "I removed the bug." You need to go in there and say, "Hey, I had some failure issue," and it needs to tie back in from the operator what his perception of the failure, because he's going to describe it, "Hey, this didn't work." He's not going to say that you had a corrosion on Pin 5 of your connector which stopped your data flow. That's going to be the engineer, and it has to tie those systems together with some software that can easily do some trend analysis. And another point we have to do is we have to keep the data long enough to do trend analysis. And there has been a push to throw systems and data away after 18 months and we need to go back five or six or seven or eight or ten years to get a statistical sample size. So those are some of the characteristics, and we're working to get some of those systems implemented now.

MR. HUBBARD: On the report that my predecessor Harry McDonald did, one of the shortcomings that he found was that the PRACA system did not appear to have all of these characteristics you just mentioned.

MR. ERNST: Harry called it the data morgue.

MR. HUBBARD: Data morgue. Yes. One of the things that you commented on just a few minutes ago was getting the material to the subject matter experts at their fingertips. Can you expand on that a little bit?

MR. ERNST: Sure. Let's switch to an avionics box failure. We need to not only have it so that a data expert who knows the system can write trend reports but the information if we get a failure back, let's say, on an INS system that failed, that individual who's cognizant of that system needs to go in there and say, "Have I had other failures on this system? Can I find some trending? Is it just recently or periodic? Can I go in and find out if memory chips or whatever type of chip is failing in other systems?" He needs to be able to do that research, that forensic science at his computer terminal and a lot of times our data systems will give us great reports on how many maintenance manhours we spent, three months late. And when we get a mishap in, when we get a box that's been failed, we need to understand and have that information right there at our fingertips.

MR. HUBBARD: It would be as if you only got a report on your checking account every three or four months.

MR. ERNST: Yes, sir.

MR. HUBBARD: Thank you.

ADM. GEHMAN: Mr. Gebman, in one of your viewgraphs that you presented on the heavy maintenance work days per depot for KC-135s and also in the heavy maintenance workload ratio which showed how much depot-level maintenance is required, how it's grown over the years, in your experience -- and I'll ask both of you this -- is that an accurate indicator that there's something else working below the system that you need to go look at? Just keeping track of how much depot-level maintenance is required and how it's growing, how does that relate to characterization of aging?

DR. GEBMAN: Excellent question.

ADM. GEHMAN: Or is it just interesting?

DR. GEBMAN: Excellent question. We have studied now all of the Air Force's fleets and have compiled the statistics for, in particular, the labor hour growth over time; and it seems that once you get beyond 15 years, you're almost certainly facing a future of climbing work to be done -- some fleets that will start a bit sooner, the fighters tend to start sooner, their lives being somewhat shorter than the larger aircraft. It just seems to be a feature of aging. It might well be somewhat analogous to people. In the older years, we find ourselves going to the doctors somewhat more often than in our teenage years.

So if you want to have a sense of the age of a fleet, one measure that you might look at is, well, how is the maintenance workload changing over time. And when you see that steep part of the curve, like the presidential transport, the old 707 known as the BC-137 in Air Force nomenclature, that one literally exploded over a couple-year period and those airplanes are no longer with us.

So it's certainly something to watch. We've tried regression analysis, various statistical methods to try to correlate the rate of rise, the characteristics of fleets. We're making some progress in that area, but this is an area where there's a lot that's not known.

MR. ERNST: You want to mention the cost-of-aging study?

ADM. GEHMAN: Go ahead.

MR. ERNST: One of the issues is I had seen the Rand data almost when I started in the aging aircraft program about four years ago and we've shared back and forth and just recently the Joint Aeronautical Commanders Group Aviation Logistics Board has kicked off an effort that we're part of to look at what are these factors, can we translate the KC-135 experience to other Navy aircraft and other Air Force and Army helos and try to understand what are those factors so we can get a better understanding of what's causing it and what the trend lines are. Just having information that says my cost is going up is not sufficient to be able to correct the problem. You need to then drill down and say, okay, but why. You know, I think on the KC-135 they have a pretty good idea of that. But that's what you need to do is not just look and say, yes, it's going up by 7 percent but you need to understand why is it going up 7 percent and what can you do to try to mitigate that curve.

ADM. GEHMAN: So my understanding is that, unlike the Dow Jones Industrial Average, the fact that older aircraft require more maintenance is not remarkable in and of itself and is not an indicator that anything's breaking or anything's going wrong. You've got to have much, much better indices at the system, subsystem component level in order to determine it.

MR. ERNST: And it's not just age. I'll give you an example. We were talking this cost of aging. I don't remember the numbers off the top of my head but one of the folks at Tinker said it's costing them X number of hours to paint a KC-135 now and it cost them a lot less ten years ago. And they said we're not adding one more ounce of paint. The problem is that you've had different changes in environmental regulations over those years, and you've got to make sure you're accounting for things properly. I mean, those environmental regulations aren't bad, but we've decided that this hurts Bambi and Flipper and those sorts of things and we want to take them off and it requires different steps and you've got to factor that in there. A lot of the cost growth you're seeing is due to things that are not age, either environmental or fleet usage. Yes, they're going to go up, but they may go up in a certain time to a manageable point and then where that curve breaks, that's what we have to figure out.

DR. GEBMAN: I'd just like to basically add that Bob is absolutely right. You need to look at the underlying mechanism. If the workload is climbing because you now have to tend more to corrosion and you're satisfied that you're able to see the corrosion and tend to it, that's manageable. In the area of fatigue cracking, you have to be a little bit more careful. Rising workload may indicate that you're getting more and more cracks closer and closer together, and one of the very important assumptions that we make in managing fatigue cracks is that the neighborhood is healthy. So as the population density of cracks starts to get too high, you run into a situation where you might have thought you were fail-safe but, in point of fact, the neighboring structure can't carry the load.

DR. WIDNALL: I'm sort of sensitive to this issue of aging aircraft because I worked on the B-52G when I was a freshman and I worked on the KC-135 when I was a sophomore. So my friends are still out there.

What I want to talk about is composite materials. I was a little sorry that you sort of excluded that from your chart, but I'd like to get a sense from you about some of the challenges associated with these composite materials. How well do we really understand their fatigue properties? Do we really understand their properties as well as we understand metals? What about their exposure to UV radiation and high temperatures and corrosive chemicals and all those sorts of issues? And I know we're using these more and more in our aircraft fleets in general and in particular on the shuttle. They're obviously a key part of it. And it's just not composite materials but other kinds of brittle materials, sort of what I would call nonstandard terms.

DR. GEBMAN: Thank you for asking this question because when I was thinking about what to talk about today, I really struggled with do I talk about the areas where we have depth of knowledge that might be useful to your investigation or do I talk equally across the areas even though the depth of knowledge is shallow. Clearly, with metals there's a lot that we know, especially on fatigue, and we're learning rapidly in corrosion.

In the area of composites, I think that Charlie Harris from NASA Langley at the conference earlier this month of the AIAA, American Institute of Aeronautics and Astronautics, this big gathering, 780 people, 525 papers, Charlie gave a talk about the progress in composites and he was very positive and upbeat about all the good technical work going on. And that was all appropriate. But then he shared with the group a round robin exercise where they sent problems around to people, the same problem to work on, and people came back with different answers. And then they did another exercise where they even told people what the problem was and they still came back with different answers in terms of the methods and the assessments.

So the whole area of composite materials is one that might be analogous to where we were with metals back in the 1950s. Back in the 1950s, we had the alloy-of-the-month club; and that's where the B-52 and the KC-135 came from. The young engineers were finding out better ways to do the chemistry to get strength, but they didn't have time to understand the durability, the fatigue properties, and the corrosion properties. I'm somewhat sensing that, with composites, we're still inventing cleverer ways to get strength but we don't yet understand the long-term durable characteristics. The science is far more complicated because with metals it's homogeneous, it's the same material, with composites you've got fibers and glues or resins and it's a very complex interaction to try to model and we're not good at it yet. So anything that is made of composite requires even more circumspection and attention than probably the metals.

DR. WIDNALL: I was afraid of that.

GEN. BARRY: Excellent presentations by both of you and raises a lot of questions. As you know, the board has taken a very serious approach to aging spacecraft in this what we call R&D development test, however you want to call it, environment.

A couple of comments. Your references to the Air Force, as obviously I'm familiar with, where we are older than we've ever been before. We've never been in this era in the United States Air Force -- as is the Navy. We're approaching ages where the average age of our platforms of 6,000 is 22 years old. So even within the data experience base that we have, flying airplanes, we're approaching new environments.

Now, let's translate that over to spacecraft. We are entering a new era in spacecraft, with reusable vehicles in an environment of aging. We've never been there before. So we've got two parallel efforts going on that certainly can kind of cooperate and graduate, as we've seen evidenced by the Navy and the Air Force here.

I've got a couple of quick questions and then a rather larger one. First question is: Is NASA involved in any of this Joint Council on Aircraft Aging, as far as you know?

MR. ERNST: Yes, sir. NASA has been involved in the aging aircraft effort since Aloha, prior to me being in it. The efforts at Langley in structures and corrosion NDI have been solid. Just recently, Christmas time frame, before Columbia, they said, hey, we recognize we need to help you in that national strategy; and they're getting more involved. We need even more. I need to fill in gaps.

GEN. BARRY: On your side as well as the space side? I mean, are they translating lessons learned to both aero and space?

MR. ERNST: Yes. I'm not going to tell you it's even and homogeneous throughout, but I know that in wiring, the shuttle folks at Kennedy are in lock step with my guys and the FAA and I know the aerospace side and structures are working real well together. We're trying to see where the gaps are and plug them in there. We need more involvement, but they have been involved.

GEN. BARRY: All right. Let me ask this. Two things. Let's just talk about corrosion and let's talk about fatigue cracking that, Jean, you mentioned earlier. Right now we have capabilities within our aircraft to do stress-testing that you mentioned as an example. We have programs that are not only based in the United States -- Australia has an excellent one on how do this. I think we all recognize that who are in the industry. What can we do insofar as spacecraft are concerned because obviously they are larger and we translate that to our larger aircraft insofar as dynamic testing is concerned, because I don't think it's unfair to say that managing aging spacecraft in NASA, for the large part, is done by inspection. So how do we translate that, what we've learned in aircraft, over into NASA as a possible recommendation?

MR. ERNST: I think you need to break it down into the subsystem component areas. For example, we had this discussion on the McDonald's team three years ago now, on the SIAT team on wiring, where we had totally different environments but we could take the Air Force and Navy's experience with aromatic polyimide insulation and say here's what we saw under these load conditions. Now, under a probably higher vibration, higher thermal but shorter duration environment, how is that going to translate? We know how that fatigue, so to speak, environment can translate and run a new model to see what it should do with the shuttle program.

That's the kind of transformation that could be done, but only if you knew how each of those subsystems and the materials of those subsystems is going to behave as a function of time and age and environment over a number of years. The problem is a lot of times we don't know that information. So we know how it works here, we know the loads are different, but we don't know how the age is going to translate as those factors are translated, if that makes sense to you. I don't think it's hard to do that, but you have to invest in some age-related studies and that's not necessarily the top of the list.

GEN. BARRY: One of the concerns we have is to be able to analyze how the orbiters have been shaken, rattled, and rolled over these many years, especially when we take into consideration that this was a spacecraft that was designed to be flown 100 times in ten years ago and now we're multiple years, decades past that and we are still only at the 20s and 30s. A question then is, you know, how do we maybe translate some of the lessons learned on how the spacecraft are flown within spec but, you know, after a while, get some kind of stress loads on them that can be accumulated over time and measured. Now, translate, if you could, the lessons learned that we've developed on aircraft that might be able to be translated over to NASA.

DR. GEBMAN: Could I have Chart No. 24, please.

MR. ERNST: You guys are going to learn this chart, because he wanted to show this to you.

DR. WIDNALL: He's ready for you.

DR. GEBMAN: This is a really tough question. Obviously, with the shuttle we don't have the luxury of a full-scale fatigue test. Obviously doing a tear-down, if this was an aircraft fleet and when we had hundreds or even tens, we'd consider taking the oldest one and tear it apart and see what ails it and then use that to guide future work. When you're down to three, that's not an option.

So then you ask yourself, well, what might we do? And when you look at this diagram, on the top row, the matter of force tracking data and loads analysis, there may be some things you could do in terms of assuring that NASA has developed all of the effort that it can, evaluating the strain gauge recordings and pressure recordings from prior flights, and that you really have as excellent a record, historically, of the loads that have been imposed on the structure as you can possibly get each.

The next thing you then could consider doing is, given the best loads data, to go back and, using more current finite element analysis methods which have improved greatly over the decades, to go in and do some spot checks on your stress computations to make sure that you've got the best that we can do in terms of estimating stresses from the given loads and then take it the next step and go in for the fatigue part to check on the crack growth calculations, the fracture toughness issues, and to make sure that the engineering community has really been resourced and tasked to do everything that we can to understand the health analytically of the fleet.

Then the final thing you might consider doing. From the debris that you do have, in effect, you have already a partial tear-down circumstance and to go in there at some point and literally take apart that which is still connected together and really check for like adhesion on honeycomb, how is that, that waffle still adhering to the face plates, and just get as much mileage as you can out of your debris in terms of understanding what the health of the remaining fleet may be.

MR. ERNST: Slice up your poles, your joints, rivet holes, things like that. That's what we do routinely.

To follow on with the chart that Dr. Gebman put up, you'll notice a couple of things. One, do a mid-life assessment of the loads. You know, the Columbia originally was kind of a flight-test bird and I believe had some several hundred pounds of instrumentation and sensors in there to measure fleet loads. To give you an example, the P-3 and S-3 program just recently completed mid-life fatigue testing at Lockheed, and we found drastic changes to both loads from what they were anticipating. The maneuvers were a little different. The theoretical issues, the early introduction issues slowly change over time. You know, it's like boiling a pot of water. It doesn't boil all at once. And I think you need to go back and really do those load surveys.

You also need to do some type of tear-down. You can't cut up, you know, the Atlantis and make it a series of razor blades and fractographic analysis and stuff; but the Columbia, when they had wiring problems in '99, NASA did go and remove certain wire segments. You can go in without cutting the whole thing up and remove certain panels, remove tiles to see adhesion, remove subsystems. When a part's going through an overhaul, take this part on overhaul and do those types of things. So there are things that you can do; but again, you've got to have a proper program to get that environment and see how we're doing.

The S-3 example in fatigue tests, we had 12 points that we considered life-limiting on the aircraft. Four of those they knew in the original fatigue tests and the odds were out of there. We found an additional eight points that were due to the loads, and due to the tear-downs we saw microscopic cracks. We were able to go in and cold-work fastener holes in that aircraft and give it fatigue life back. Real simple operation, real cheap, and not have the 305-inch wing cracks we had in the P-3. So you're able to do some of those things if you invest in the time and the resource and have a robust program.

DR. GEBMAN: If I might, I'd just like to follow up. Could I have Chart No. 7, please. There's an important aspect that I neglected in my answer, and that is that we're dealing with a spacecraft. And I apologize. Obviously with something like the shuttle, you have thermodynamics acting as well as structural dynamics; and in addition to getting a solid characterization of the historical loads, you also want to get a solid characterization of the historical thermodynamic exposure because -- take a spar cap, any one of those four spar caps that are identified with the arrows. If, in the course of the history of a particular spar cap, it has been exposed to temperatures different than the other spar caps, then the loads in that part of the structure are going to be different by virtue of the thermal expansion of the material. So this is a very complex thermal as well as structural dynamic circumstance.

ADM. GEHMAN: Let me follow up on that before I call on another board member. Do I understand that you are suggesting that it's useful in the study of aging aircraft to establish some measurements of what I would call stress cycles or something like that? We understand age. We understand landings and takeoffs. But there are other events which cyclically stress the aircraft, particularly in the case of the shuttle. And it's useful to keep track of those, in addition to the obvious ones like landings and takeoffs and how many months, hours and all those kind of obvious things.

DR. GEBMAN: These things with aircraft are tracked routinely. Exceedance curves are developed which are a statistical way of representing even the small variations. My most recent comment suggests that we should also construct a thermal exceedance spectrum, as best we can from the historical data, so that to the extent that we've got differential thermal expansion of structure going on, we can factor that into the loads that the members receive.

You see, there's two load levels. One is the aerodynamic load and the inertial loads applied to the gross structure. The other issue of load is, for a particular structure member, what load does it see over its lifetime; and that can be driven by thermal expansion issues, just as it can be driven by the aerodynamics. And given the historical records of the temperatures, the engineers should be able to construct and may already have done thermal exceedance curves to go along with load exceedance curves.

MR. ERNST: I think you need to look at every environmental factor and see if is there a similar type of a correlation in there. We've done a good job of fatigue tracking. We're tracking a lot more parts than we used to. The models are a hundred times more detailed than they used to be. We can calculate things a lot finer, but I think you need to be able to look at all the different loads in environments that any vehicle goes on and say, okay, what's changing, what's the effect of that over time.

ADM. TURCOTTE: For both. Kind of the 3Cs in aging -- you know, Kapton, Koropon, and corrosion -- which go back a long time in finding problems with Kapton wiring, with Koropon bonding, de-bonding, heat translation, all of those things. That's Part 1 of the question. Could you, kind of both of you, talk a little bit about major lessons learned from both fleet usage, commercial usage, and your knowledge to the extent of findings on the shuttle, both, you know, galvanic or intergranular types of corrosion.

Part 2 question. If you were king for a day with your knowledge of the PRACA data base, what would you do to improve it?

MR. ERNST: You're going to get me in trouble. I was very nonpolitically correct about the PRACA data base in 1999. And I have not seen it since then but I think if you go back and you read the Shuttle Independent Assessment Team report, you will find that the comments of the group were less than favorable on PRACA. I'm not saying that the Navy and the Air Force and the Army's data systems are perfect, but we're taking steps in the right direction. So I really can't comment on what they're doing today. I know they made some improvements, but it was pretty abysmal back in 1999 and, I think, masked some of the issues that feed into your risk equation that we saw back then. I think that was a mistake.

As far as handling some of the materials and some of the issues with Kapton, aromatic polyimide insulation manufactured under the Dupont trade name Kapton -- get that correct -- we didn't do a good job on establishing realistic life cycle testing for that material when it was introduced. Kapton has a lot of good properties. I don't believe I said that, but it has a lot of good properties. It's very, very tough. It has some very adverse characteristics that we never tested for. But I think you can go through several other tests and I know there's been arguments with the FAA on the flammability tests, whether that's applicable, and there's lots of different tests and we didn't do a good job of running a qualification test and an aging test that's run on a short period of time that's trying to cover 20 or 40 years. So we made some mistakes on that.

The other issue is once we had problems with the wiring insulation, I don't think we developed realistic scenarios. If you look at the cost of replacing and rewiring a whole aircraft, it's several million dollars. Well, do I really need to do it? Do I need to do it in all areas? Which platforms do I need to do first? And what we have done now is develop a bouquet of options. Whatever color of flowers you want and whatever kind of room, it goes together. Because what my wiring options on the F-14 Tomcats, which are going to be retired in the next four or five years, is totally different than what I would do on earlier production F-18s or P-3s that are going to be around a little longer. So you have to develop options based on risk so that you can do things quickly, cheaply, easily, and get it done and not just give one option is all.

So I think two issues. One, we didn't do a good qualification testing and we need to continue, just like the life cycle testing, just like the fatigue tracking where you update it and you get better; and the second issue is we didn't develop any options.

DR. GEBMAN: On the matter of wiring, the Air Force in the case of the KC-135 embarked on a major rewiring program about five years ago; and that is going to probably continue for the next four to five years, at which point they will have substantially replaced the wiring on the 135. The basis for this was an accumulation of maintenance action that was becoming increasingly costly to exercise and a concern for flight safety, and those two factors together seemed to have driven the train on that fleet.

Unfortunately, our ability to predict life, we don't have the engineering tools that we have with fatigue cracks, either with composites yet or, for sure, with wiring, which makes those areas very difficult to feel comfortable about with an aging fleet.

ADM. GEHMAN: How comfortable would you feel with the study of the aging characteristics of a main engine that's fueled by liquid hydrogen and burns a thousand gallons a second and produces a million pounds of thrust? How's our data base on how that baby ages?

DR. GEBMAN: Well, on my chart I did include a line that said propulsion; and it didn't get extremely high grades for data or methods or people that really understand life issues in that area. So you've hit another excellent nail squarely on the head. For those areas, going back to General Casey's comments about understanding margins and managing to margins, you really have to worry that as time goes by, you're eating into those design margins and at some point the ice become thinner than what you're comfortable with. And that's a technical judgment probably more than an engineering calculation.

MR. ERNST: Follow up. One of the successful programs that the Air Force and the Navy has is on aircraft engines. And they've realized that you've got a lot of moving parts, a lot of high temperatures, a lot of complex interactions in there. And they have what they call CIP, Component Improvement Program, where they go back in and they test and they see where their problem areas are and they incrementally try to infuse newer technologies and fixes in the early parts of the service.

Again, that's one of those that's always fighting to try to get resources adequately in there, but if we follow what the commercial industry does, we can really improve the reliability and we can have a pretty good idea and almost get to a scheduled maintenance type of inspection so we're not flying and say, yeah, lost an engine or had a shutdown but, okay, now at 7, 8 hundred hours I have an 8,000-hour interval period and know exactly what to replace. So that's another example where we've taken the methodology that Dr. Gebman talked on structures and we've transferred over to the engines, and I think both the commercial and the military have very good experience in that being successful.

DR. GEBMAN: I certainly wouldn't quarrel with my distinguished colleague, but I would hasten to add that the commercial engine and even the military engine circumstance with aircraft is far different than the circumstance we're talking about here.

DR. LOGSDON: This is all very far away from the experience of a Washington policy wonk. So excuse me if these are really naive questions. What does the fleet size of three do to the ability to do the sorts of things that you think ought to be done?

And the second question, I think it's really for Mr. Ernst, coming out of his independent assessment experience. Is NASA routinely collecting the kind of data that would feed into the kinds of trend analyses? You know, outside of faults, PRACA and that, is there a data base that you could apply some of these methodologies to?

MR. ERNST: Well, I think all the agencies and commercial are collecting a fair amount of data.

DR. LOGSDON: On shuttle.

MR. ERNST: On shuttle? I mean, you look at the Navy programs and Air Force programs. We're collecting 80 percent of what we need. I still think we need to do more the cause of failures.

For example, if I went into the Navy's data base on wiring chafing, there is no failure code for chafe right now. What's the primary failure mode for wiring? We're fixing that, by the way. So I can say that, but that's one of the issues. I mean, we're not recording the right type of information in all cases. We're about 80 percent there.

My beef with PRACA at the time was you couldn't go in there easily and extract anything to make decisions. I at least can go into some of the services' data bases and pull some information and get a pretty good idea and then at some point I have to play archeologist or forensic scientist and go back through and do some more. But it works out to that 80 percent. There need to be some other changes; and, unfortunately, data is the one thing that everybody wants to cut in the budget crunch. We don't want to pay for that data.

DR. LOGSDON: If I understand PRACA correctly, you have to have a problem or perceive a problem to even get in the system. I'm saying is the shuttle even instrumented to capture the kind of data that you would like to have to measure various elements of its aging.

MR. ERNST: Not in all cases, but I think you can probably do some work-arounds with that and be able to check things. I mean, you don't have to do everything in flight. You can do engine warmup cycle times and check temperature-wise in there, check component issues, and test things. Things like that. You can capture that information if you need to.

You need the maintenance-reporting information, which PRACA primarily did. You need to trend analysis like if I get to this certain load level, this is going to impact my fatigue life. And then you need to be able to do periodic instrumentation at times. And it doesn't always mean a full-scale in-flight test. It means capturing some of the data. And that data was available. Could you get that. Was it easily, readily available? No, it wasn't readily available.

DR. GEBMAN: Putting my engineering hat on relative to your data question, given that the instrumentation and wiring in the shuttle and the systems were designed in an earlier era in terms of electronics, it might well be worthwhile rethinking the matter of what are we interested in observing during future flights in order that be might create a more complete record of environment and loads so that we can better manage the remaining lives of the fleet.

MR. ERNST: Health management, health monitoring for the system.

DR. GEBMAN: And regarding your observation of the number three, what does it mean to have three in a fleet? From an operational perspective, one of the early lessons I learned at Rand was that whenever you visit a unit, you always expect -- and Admiral Turcotte will appreciate this -- you always expect at least the Nth airplane to be a source of supply for the others, if you're lucky. Sometimes it's more than just the Nth airplane. So if you have a fleet of three, from an operational perspective, one of the three is needed to support the operation of the remaining two. And to have an operating fleet with just two means that you only have one backup and that's very thin.

MR. ERNST: And I think it makes your correlation. A lot of times when you have how many hundred F-15s and F-16s, you can start looking at the gross number of failures and say I need to look at something. When you have three, you can't rely on that. You have to take a little bit different systems approach to be able to capture your data.

The Navy flies some type model series, you know, that are 12. Twelve EP-3s. And each one of them is a slightly different configuration. But you can capture that information. It just requires a little different approach, and sometimes it's not as robust, predictive, leading edge because you don't have that significant sample size.

MR. WALLACE: Were you suggesting, Dr. Gebman, that sort of the fleet leader concept; or were you suggesting cannibalizing parts? I wasn't entirely clear.

DR. GEBMAN: No matter how good your supply system is in terms of providing parts, you always end up in a circumstance where you have a first-time demand for a part and the last airplane of the unit then becomes the offer of that replacement part. I think that if you talk to the NASA folks regarding the matter that's referred to commonly as cannibalization, it's borrowing a part from one aircraft or spacecraft in order to be able to launch one that's scheduled to go.

MR. WALLACE: Another question. This is jumping subjects a bit. Should the goal of an aging aircraft program grow beyond maintaining the aircraft to be as good as new? What I mean by that is: Should it meld in with sort of obsolescence issues, issues where the technology has simply gotten to be so far behind the state of the art that it either makes sense for economic or safety reasons to upgrade or even reasons of simply maintainability?

DR. GEBMAN: You're raising the issue of replacement, fleet replacement; and we have struggled at Rand with the Air Force long and hard on that matter because, for example, the tanker fleet. It's a very important fleet. Without the tankers, the Air Force doesn't go places. They don't have aircraft carriers to carry their airplanes. So they're very dependent upon their tankers; and to have almost all of your tanker fleet wrapped up in one type of aircraft that's 40 years old now and to be planning to do so for another 40 really raises questions.

The first thing we looked at, well, is there a case on economic grounds for replacing the fleet. There was an economic service life study done and it shows rising costs, but it doesn't show the rising cost by themselves being a sufficient basis for justifying a new fleet, whereupon then you start asking questions along the line of obsolescence issues, foregone capability improvements that you can't have without substantial investment in an aging fleet. So this whole question about when is it wise to replace a fleet is one for which we still don't have a good methodology for dealing with.

MR. WALLACE: I really didn't intend to ask that question about replacement. Well, it was a good answer. But about replacing the fleet as opposed to simply upgrading, particularly, I mean, fleet replacement, you know, lots of smart bean-counters with spreadsheets do that for the civil aircraft industry but I think there's a whole set of different issues with next-generation spacecraft. My question really is more about upgrades.

MR. ERNST: To address that -- and you picked on obsolescence. When you get to the microcircuit obsolescence issue, which has become a science fair, pet rock project of mine over the last 10 or 12 years, there are lots of different options and right now we are system-incentivized to find this chip to put in this box in a lot of cases. We found about a third of the time that doesn't make sense because not only is that part obsolete but the three around it are terminal and the whole board's wearing out because we keep replacing it so many times because of poor reliability. So it's probably better at that time to take the whole thing, take the cards out, and make it a lobster trap somewhere and then put a new system in there. That really happens about a third of the time. But we need to again, I think, balance some of the different pots and stovepipes of money that are available, especially in DOD, to be able to optimize those issues and have the best understanding of the age effects, where they're going to be two years from now, because I may make a replacement today and I've got three more downstream. I need to look where I'm going to be three years from now and say this is time to replace this 1988 Tercel that I had with 189,000 miles and go buy something new because this is just the tip of the iceberg. And I don't think we're doing a real good job of that but's one of the challenges of not just maintaining status quo but looking and saying what capabilities, what mission growth areas, where am I going in some reliability issues and balance all of those into like a triangle of a decision matrix.

DR. GEBMAN: There's a fleet that we're looking at now that has the potential for receiving an upgrade to its aviation electronics to give it capabilities to continue its military relevance. And there are also a series of mods being considered to upgrade the engine so that its flight safety features remain appropriate. And similarly with the air frame. And as we're going through the arithmetic on this particular fleet, one of the things that we're seeing is that by the time you're done making whichever of the three mods or all three of them to the fleet, the years remaining becomes very significant to your choice. And when you go to the operator and you ask the operator, well, how long do you want to retain this fleet, well, they're really not sure. So this question is almost as difficult as the fleet replacement question.

MR. ERNST: And you look at the mission changes in the Department of Defense in the last couple of years where we've gone from a Cold War scenario to more of a small conflict and now global war on terrorism and it changes. We have planes that, to pick on Admiral Turcotte's S-3, that were designed to hunt subs that were doing surveillance and tanking and dropping weapons and doing, you know, partridge in a pear tree and everything else. And you need to look at those mission changes as a function of age too and say, you know, I may be able to keep this aircraft doing what it did five years ago but you know I need to replace it. I need to go over here. And we don't always balance all those issues.

I know the Air Force is really trying to look at that decision and set up a fleet viability board to weigh the aging factors in these mission scenarios. I'm monitoring that for the Navy to see what they do; and then after they get all the kinks worked out, we'll steal it. But that's kind of the approach. I think that answers that it's not a simple answer but that's what needs to be looked at. I think the shuttle has the same issue: Where does it need to be ten years from now?

MR. HUBBARD: I heard one of you mention or whisper the term "vehicle health monitoring," I think. The notion of a fleet of three. I'd just like you to think out loud for a minute or two about how vehicle health monitoring would apply in this case along three lines. One, what would a systems approach be to that, given that we have a fleet of three? Second, realtime versus recorded measurements? Third, what other measurements could you imagine? I mean, we've got a thermal protection system, for example, that is pretty unique to the orbiter versus the military aircraft you mentioned. We've got pressure, strain, and temperature. Can you imagine, in this kind of systematic approach to vehicle health monitoring, what one might do?

MR. ERNST: Let me answer in reverse order. I don't want to bad-mouth technology. And I've talked about some cultural issues but there's some real technology advancements. I know some of the DOE labs have now started looking at electronic signature analysis for failures in motors, predicting when motors are going to fail. There are all kinds of things. I mean, you can literally go around to the different areas and find better ways that people can get precursors to failures if they measure data and give you good information. That would help us understand. From an overhaul interview, it would let us know if you had a degraded flight mode issue so that we're not having, yes, that system failed, we have to do something else. It would really help you manage your redundancy a lot better, too. So there are a lot of new technologies beyond the strain engages that I learned about in college that need to do.

I think the realtime versus recorded is something you need to use a system engineering approach in analyzing. There are oil analysis systems that I remember we had a vapor cycle system and by the time you got oil in the filter, you had basically eaten the whole system; it was too late. So putting an oil analysis system that you measure it every ten hours wasn't doing any good. It needed to be realtime.

Not everything needs to be realtime and any information at all, whether it be on one unit or on three units, is a lot more than no information and I think that having some health monitoring systems on any fleet -- shuttle, the F-18, the S-3, or whatever, F-15 -- gives you information if you use a good systems engineering approach, not just collect data for data's sake but see what are you trying to do with the data and then drive what you need to collect to get data or what technology best does that, I think, is helpful.

DR. GEBMAN: I would like to speak both as a proponent and also share a word of caution. The engineering in me would prompt me to want to put strain gauges and instrumentation in many places. Probably too many. There's a trade-off between the disease and the cure, and it's possible to overdo a good thing. We need to remember that, with this instrumentation, comes wires; and we've already been talking about the vulnerability that wiring can introduce into the system. So what I would think might be helpful is to try to understand what are the critical issues that we're concerned about or we should be concerned about and then ask, for those critical issues, what initially at least modest amount of additional instrumentation might be appropriate and try to really focus on the core vulnerabilities and not to go too quickly too far overboard.

MR. ERNST: We can't be kids in the candy shop. I agree.

ADM. GEHMAN: Thank you, sir. I'm going to ask the last question myself; and, hopefully, it's a brief one. I think probably, Mr. Gebman, your Chart No. 3 answers this question; but I want to allow us to listen to it for a second. Would you list the aircraft aging areas of examination as to which of them appear to be mature technologies and which of them appear to be not so mature? Obviously the detection of corrosion, of course, is obviously a big one and I suspect we'd probably know a lot about that.

DR. GEBMAN: Probably the quickest answer to the question would be to focus on the first column and the last three columns. In the last three columns, we have my subjective assessment of where we stand in terms of data, methods, and people. The metals area for structure, we're in very good comparative shape to the others.

In corrosion, our data and our methods are still what embryonic but now, thanks to the various laboratories really engaging the last several years in a more aggressive way, we're building a core of people that are knowledgeable in the area.

The business of adhesion, we haven't paid much attention to it. And my sense is that our data and methods are below low and even the number of people really knowledgeable in that area is not great.

Moving down to the composites, there's a lot of people out there. There's a fair number of people doing excellent, promising research; but the fruits of that research in terms of data and methods is still forthcoming.

In the area of propulsion, the general area strikes me, especially when we're thinking about shuttle types of applications, as not particularly high. The whole area of high-cycle fatigue is still a challenge for the engine community, even for commercial aircraft.

Then the "Other" category. This is the one that worries me most because it's oftentimes the one that's not getting the attention that's the one that bites you the hardest. Functional systems, pumps, motors. The TWA 800 killed more people than metal structures in recent times, and that may well have been down in this "Other" category, either the wiring or the functional systems.

So as the board moves forward with its good work, attention to all of the technical areas. And all that I've tried to accomplish here today is to bring forward that there are some areas where the aging aircraft community really has depth. If that proves to be relevant or of interest, the community is certainly prepared to help. In the others, it's going to be more challenging.

ADM. GEHMAN: Well, thank you very much. On behalf of the board, I would like to express your appreciation for your attendance here today and your complete and helpful replies to our questions and the information that you've given. You're obviously great experts and we've learned a lot and we hope that we can apply it to this problem. We appreciate your attendance.

We're going to take about a ten-minute break while we seat the next panel, and we'll be right back.

(Recess taken)

ADM. GEHMAN: All right. We're ready to begin our last session for the day.

It's a privilege for the board to recognize Dr. Diane Vaughan from Boston College. Dr. Vaughan has written an influential and well-read book on the Challenger accident. We are continuing our look into the business of risk assessment and risk management. This is one of the classic studies on the Challenger accident. Most of the board members have at least read parts of your book, Professor Vaughan; and we're delighted to have you here.

DR. VAUGHAN: Thank you.

GEN. BARRY: And we're ready for a test.

ADM. GEHMAN: I would like you to please, if you would, before we get started, introduce yourself by telling us a little bit about your background; and then if you would like to say something to get us started, we would be delighted to hear you.

DIANE VAUGHAN testified as follows:

DR. VAUGHAN: Thank you. I'm a sociologist. I received all of my education at Ohio State University, getting my Ph.D. in 1979. After that, I had a post-doctoral fellowship at Yale; and I began teaching at Boston College in 1984, where I am currently a full professor.

My research interest is organizations. I'm, in particular, interested in how organizational systems affect the actions and understandings of the people who work in them. So it's what we call, in my trade, making the macro-micro connection, how do you understand the importance and effect of being in an organization as it guides the actions of individuals. My research methods are typically what we could call qualitative, which are interviews, archival documents, and ethnographic observations. So using these methods, I have written three books, the last of which was The Challenger Launch Decision, which was published in 1996.

ADM. GEHMAN: Thank you very much. You may proceed.

DR. VAUGHAN: All right. I want to start from the point of view of Sally Ride's now famous statement. She hears echoes of Challenger in Columbia. The question is: What do these echoes mean? When you have problems that persist over time, in spite of the change in personnel, it means that something systematic is going on in the organizations where these people work.

This is an O-ring -- not The O-ring, but it is an O-ring. I want to make the point that, in fact, Challenger was not just an O-ring failure but it was the failure of the organizational system. What the echoes mean is that the problems that existed at the time of Challenger have not been fixed, despite all the resources and all the insights the presidential commission found, that these problems have still remained.

So one of the things that we need to think about is when an organizational system creates problems, the strategies to make the changes have to, in fact, address the causes in the system. If you don't do that, then the problems repeat; and I believe that's what happened with Columbia.

What I would like to do is begin by looking at what were the causes of Challenger and, based on my research, to point out how the organizational system affected the decisions that were made, and then make some comparisons with Columbia and then think about what it might mean, taking that information, to make changes in an organization to reduce the probability that this happens.

One of the things that we have learned in organizational --

ADM. GEHMAN: Excuse me for interrupting. If I may ask a question while we're still on this subject. On your first viewgraph, the first bullet, you said when you find patterns that repeat over time despite changes in personnel, something systemic is going on in the organization. There are no negative connotations in that sentence. You didn't say something wrong is going on in the organization. I assume the obverse is also true. If patterns repeat over time and you keep changing people and you keep getting good results --

DR. VAUGHAN: The system is working. Right. It's the fact that there is a bad outcome that we're looking at here. Thank you.

ADM. GEHMAN: Thank you. Sorry for the interruption.

DR. VAUGHAN: I wanted to begin and go back over just really briefly what happened in Challenger. First, the presidential commission reported that there was a controversial eve-of-the-launch teleconference during which worried engineers at Morton Thiokol, the solid rocket booster contractor in Utah, had then objected to the launch, given that there was going to be an unprecedented cold temperature at launch time the next day.

Marshall management, however, went ahead and launched, overriding the protests of these engineers. Not only did the commission discover that but also the fact that they discovered that NASA had been flying with known flaws on the solid rocket boosters O-rings since early in the shuttle program, that these flaws were known, and known to everybody within the NASA system.

May I have the next slide, please. What happened was what I called an incremental descent into poor judgment. This was a design in which there were predicted to be no problems with the O-rings, no damage. An anomaly occurred early in flights of the shuttle and they accepted that anomaly and then they continued to have anomalies and accepted more and more. This was not just blind acceptance, but they analyzed them thoroughly and on the basis of their engineering analysis and their tests, they concluded that it was not a threat to flight safety. It's important to understand, then, that this history was a background in which they made decisions on the eve of the teleconference; and that was one more step in which they again gradually had expanded the bounds of acceptable risk.

Next slide, please. One of the things that's critical with Challenger, and also now, is the fact that we tend to look at bad outcomes and work backwards and we're able to then put in line all of the bad decisions and apparently foolish moves that led up to it. It becomes very important to look at the problems as they were unfolding and how people saw them at the time and try to reconstruct their definition of the situation based on the information they had when they made their choices.

Next slide, please. The Challenger launch decision was, in fact, a failure of the organizational system; and I hope, by going through the explanation, it will show why it was not groupthink, it was not incompetent engineers, unethical or incompetent managers.

Next slide, please. So what happened? Richard Feynman called it Russian roulette, which implies that there is a knowing risk-taking going on. The result of my research, I called it something else, the normalization of deviance; and I want to use the organizational system perspective to explain how this happened.

The idea of an organizational system is that there are different levels at which you have to do your investigation. So the first is the people doing the work, their interactions, and what they see; the second level is the organization itself; and the third level has to do with the environment outside the organization and the other players that affect what's going on internally.

So let's start with the bottom layer, the people doing the interaction. First, it's important to know that they were making decisions against a backdrop where problems were expected. Because the shuttle was designed to be reusable, they knew it was going to come back from outer space with damage; and so there was damage on every mission. So simply an environment like that, to have a problem is itself normal. So what to us in hindsight seemed to be clear signals of danger that should have been heeded -- that is, the number of flaws and O-ring erosion that had happened prior to Challenger -- looked different to them. The next slide will show how they looked as the problem unfolded.

What we saw as signals of danger, they saw as mixed signals. They would have a problem flight. It would be followed with a flight for which there was no problem. They would have weak signals. Something that in retrospect seemed to us to be a flight-stopper, to them was interpreted differently at the time. For example, cold, which was a problem with the Challenger flight, was not a clear problem and not a clear caught on an earlier launch. Finally, what we saw as signals of danger came to be routine. In the year before Challenger, they were having O-ring erosion on 7 out of 9 flights. At this time it became a routine signal, not a warning sign.

The next slide, please. That's what's going on on the ground floor. So the question is then how does the organizational system in which they're working affect what they're doing and how they're interpreting this information and how their decisions move forward. This is what I call the trickle-down effect. Congress and the White House were major players in making decisions, and their policy decisions affected how people were making decisions in the project.

The budget, the problem of Challenger starting out with insufficient resources, meant that the only way the program got going was by Challenger, by the shuttle program being responsible in part for its own livelihood. That is, it would carry payloads. The number of payloads it would get paid for annually were expected to contribute to its budget.

So early on, the space shuttle project was converted from what during the Apollo era had been an R&D organization into a business. Contracting out and regulation both had altered the shuttle program so that it was much more bureaucratic. There were a lot of people who had been in pure engineering positions were reversed in the sense that they became more administrative. They were put in oversight positions, and they had a lot of desk work to do.

Finally, when the program was announced, it was announced that it would be routine to fly shuttles into space. It would operate like a bus. So the expectation that it would be routine also had an effect in the workplace. The effect was to transform really a culture that had been pure R&D, with emphasis only on the technological discovery, into one that had to operate more like a business in that cost was a problem, production pressures were a problem.

The notion of bureaucratic accountability made the agency what some people told me was bureau-pathological. That is, there were so many rules, there were so many forms to be filled out that these kinds of tasks deflected attention from the main job of cutting-edge engineering. It wasn't that the original technical culture died but that, in fact, it was harder to follow it through with these other influences on the shuttle program.

How did these actually play out on the ground? Next slide. The original technical culture called for rigorous scientific and quantitative engineering, real solid data in the form of numbers to back up all engineering arguments; and that was still true. However, also with the original technical culture, there was a lot of deference to engineering and engineering expertise based on the opinions, valued opinions, of the people who were doing the hands-on work.

The latter was harder to achieve within a bureaucratic organization where hierarchy dominated. The schedule became a problem interfering with the decisions by compelling turn-arounds in time to meet the schedule, so that expected research on hardware problems sometimes continued past the next launch. So they were still getting more information while a new launch was in process.

It also affected them in that the engineers and managers truly followed all the rules. In the midst of a system that many people at the time said was about to come down under its own weight before Challenger, what was happening was the fact that they followed all the rules in terms of having the numbers, in terms of procedures, gave them a kind of belief that it was safe to fly. Engineering concerns had to be backed up with hard data or there couldn't be money set aside to do a correction to the program. Hunch and intuition and concern were not enough.

Next slide, please. The third part is to say, well, there was a long incubation period here. Why didn't someone notice the trend that was going on with the solid rocket booster project in terms of O-ring flaws and intervene? This is where the organization's structure was at that time a problem. The safety system had been weakened. One safety unit had been completely dissolved, and staffing had been cut back. Top administrators, because of extra work in an expanding programming, were no longer able to maintain what in the Apollo program was known as the dirty-hands approach -- that is, keeping in touch with the technology, the problems, and the riskiness of it.

And the anomaly tracking system, which was another way that you could get warning signs, made it very difficult for administrators to isolate serious problems. At one time under their Criticality 1 category, which is the most serious label that you give to a technical problem, they had 978 items on it. So how, of those, do you sort out which are the most serious?

Next slide, please. With this as an outline, I'd like to move to some comparisons, the echoes that Sally Ride talked about. First, here I'm drawing analogies. I spent nine years on the Challenger book and I haven't done this on this case and your investigation is still underway. So where I'm able easily to identify the similarities, it's harder to define the differences; and what we see now as similarities are yet to be proved. So my goal here is just to maybe point you in some ways to look, and not come to any conclusions.

First, in both circumstances, Columbia and Challenger, a crisis -- well, let's say it was a crisis of uncertainty. Circumstances happened for which they had no background experience. They came to this condition of high uncertainty with a belief in acceptable risk -- that is, based on all the flight readiness reviewed decisions that had proceeded, they believed they were flying with a vehicle that did not have a problem that was related to, in Challenger, O-rings and, in Columbia, the foam problems. They believed in their own analysis. That was this background, and they had engineering reasons for believing that.

Second, in each of those cases, Challenger and Columbia, there had been an event in the recent past that had some import for their decision-making that night. For Challenger, the year before the launch, STS 51B was launched in January. The condition that the engineers on the eve of the Challenger launch were concerned about was the cold temperature, which for the next day was predicted to be at an all-time launch-time low. The STS 51B, which was launched in January of 1985, was launched where also cold temperature mattered but not on the launch pad. The cold temperature had been the three previous nights when the vehicle was sitting on the launch pad and the temperature was down 19 to 22 degrees at that time.

The foam strike in Atlantis. There had been several foam strikes preceding the Columbia launch. The Atlantis foam strike, which happened in October of 2002, was the most recent. The history in the foam strikes was that they had problems with imagery, that they couldn't see so much the location of the strikes and so on. So that was part of the history which led to the fact that that night they didn't have or that -- when they discovered the foam strike, that they didn't have good data.

For the cold temperature on 51B, there was a similar effect. At the time when they did the analysis, the engineer who went to the Cape and looked at the vehicle when it was disassembled and looked at the solid rocket boosters was alarmed because he saw that in the base of the putty in the groove in which the O-rings lay, the grease was charred black like charcoal; and he believed that this was significant. But when they came forth after that with their analysis of 51B for the next Flight Readiness Review, their analysis showed them that it was still safe to fly. They had had damage of the O-ring, they had serious O-ring erosion, and they had had for the first time hot gases that had gone beyond the primary O-ring to its backup, the secondary O-ring, and their analysis told them that in a worst-case scenario, it would still work. It would still work.

Where does cold come into this? The engineer who saw the charcoaled grease had this feeling that, intuitively, this was bad. So when he argued that cold should be a serious concern, they had at that point had many things happening with O-rings. The smallest thing could cause damage. So, for example, a piece of lint in the bed of putty in which an O-ring lay could cause erosion. Each time something different had happened. They believed that there was no generic problem because they were not having damage on every ring on every mission. Sometimes they would not have any, so that he could not prove that cold was a correlation with the O-ring damage.

They decided at that point that they should get some cold temperature data; but they didn't scramble to get it, as this engineer said. The reason they didn't was they believed it was a unique incident, that the chance of overnight temperatures of that low for three nights running in Florida was, in his words, the equivalent of having a 100-year storm two years in a row. So there was no scramble to get temperature data. They did some resiliency tests, but they did not have systematic temperature data. So in both circumstances, when the condition of high uncertainty came up for both Columbia and Challenger, they did not have a lot of supporting data, they didn't have the best data available to them and this, it turned out, mattered.

The third point is that the organization's structure interfered with good communication, and it interfered in several ways in which there seem to be parallels across cases. There were, in this case, missing signals. People who had information, if that information had been relayed up the hierarchy, might have made a difference. People in the Challenger evening teleconference were in three different locations, and they were in telephone communication but not video. People were in different locations who did not speak up, so their message didn't get across on the main teleconference line.

Why didn't they speak up? Some people felt that that was their specialization, they hadn't worked on it recently, and therefore though they had some input and they had some information, they didn't know what the most recent data was. Some people didn't speak up because it simply wasn't their specialization. Other people didn't speak up because they trusted in the hierarchy, they trusted in the key people who were running the teleconference to guide it in the right direction, they trusted the engineers at Thiokol to do the analysis. Those were some of the reasons.

One of the parallels with Columbia comes up in the accounts of the E-mails that were circulated from approximately the 21st on, worries of concerned engineers. From newspaper accounts that I've been able to conclude and the E-mails themselves, that in a sense they were marginal to the process, they had not been brought in early on, this was a conversation they were having among themselves. They were also specialized and felt that perhaps they didn't have the same information that other people had. There was a trust in the hierarchy; and, as one of them said after a press conference early in your investigation, "We didn't have the data." That is, they were concerned they didn't have any hard numbers.

One of the characteristics of the conversion from the Apollo-era culture to the Challenger-era culture was that intuition and hunch didn't carry any weight. They carried weight in everyday, daily decision-making and batting around ideas, but when it came to formal decisions like the Flight Readiness Review, it was hard data, it was numbers that were required. And in this case it was significant to me that he said we didn't have the data and therefore, not having the data, they didn't feel empowered to speak up in these E-mails and carry them upward farther.

There is evidence of production pressure in the Challenger case that I haven't seen yet in Columbia. In Challenger, there was a deadline for the engineers to make their preparation for their eve-of-the-launch teleconference engineering recommendation about the relationship between the cold temperature and O-ring erosion and what they expected, what they were recommending in terms of launch. They scrambled to put their analysis together, dividing up the work, and began faxing their charts over the telecon line without having the time to look through them, and if they had taken that time, they might have noticed ahead of time -- if they had collectively looked through them, they might have noticed ahead of time that they didn't have a strong correlational argument. So as a consequence, it was a weak argument in terms of the engineering culture at NASA. The hard numbers didn't hold together. They couldn't prove that there was a cold temperature correlation with O-ring damage.

At one point the key engineer said, "You know, I can't prove it. I just know it's away from goodness in our data base." But in that culture, that was considered an emotional argument, a subjective argument, it was not considered a strong quantitative data argument in keeping with the technical tradition at that time.

So far there isn't any evidence of engineering concerns during the history of the foam problem like there was with Challenger either. Afterwards, there had surfaced some memos in Challenger, the previous year in particular, as engineers at Thiokol was trying to get through the bureaucratic rigmarole in order to get the help they needed to try to analyze the problem; and they were working on a fix at the time.

The other point I wanted to make was about bureaucratic accountability. What was obvious with Challenger was that on the eve of the launch that the concerns of the engineers were not prioritized. It also seems to be the case in the requests for the imagery from Columbia that concerned engineers discovering the foam strike at this point described it as it was large. There was nothing in their experience like this. It was the size of a Coke cooler. This was unique. They met, a team of approximately 37 engineers, and made a request for better visuals than the ones that they had from ground camera; but somebody up the hierarchy canceled the request. In a condition of high uncertainty. One of the comments that I read in the newspaper -- and I don't claim to have all information on this -- was that the request had not gone through proper channels, which points to me the significance of rules and hierarchy over deference to technical expertise in this particular case.

There are many conclusions we can think about from this, but one of them is that in both of these situations, following the normal rules and procedures seemed to take precedence; and we know that, in fact, in conditions of uncertainty, people do follow habits and routines. However, under these circumstances where you have something without precedent, it would seem that this would be a time not for hierarchical decision-making but for a more collective, collaborative, what does everybody think, let's open the floodgates and not pull on the usual people but especially what are the concerns of our engineers and also to let up on the idea that you have to have hard data. Engineering hunches and intuitions are not what you want to launch a mission with; but when you have a problem that occurs that's a crisis and you don't have adequate information, this is a reverse of the pro-launch situation, in which engineering hunches and intuitions ought to be enough to cause concerns, without asking for hard data.

So what's to be done if it turns out in this investigation that you do, in fact, find a failure of the organizational system? Could I have the next slide, please.

Typically in the results of an accident investigation two things happen. One is that the technical culprit is found and a technical fix is recommended and achieved; second, that key decision-makers are identified who had important roles and where they might have prevented a bad outcome but didn't. More typically, the organizational system goes untouched. It is, in fact, more difficult to identify the flaws in the organizational system. It's harder to pin it down and it's more challenging to try to correct it, but, in fact, there are many people who are experts in how to build high-reliability systems and what are the problems with systems from an organizational system that might help in advice in circumstances like this.

Next slide, please. Just looking at the model that I put up earlier where we looked at the trickle-down effect, it leaves three levels at which you might target changes. First, the beauty of operator error is that it deflects attention from key policy decisions made in the past that have affected a program and affected the daily operations. Policy leaders need to be concerned and aware of their responsibility with risky systems and be aware of how their choices affect the hands-on work. They also are responsible and implicated.

Cultures, for example, are hard to change; but leaders might try to change them, even if they weren't the ones who created them. It's important that they remain in touch with the hazards of the workplace. Whereas in the modern NASA it may be more difficult for administrators to stay in touch with the hazards of the workplace and the dirty-hands approach cannot be carried out like it was in the time of Apollo, still it's important to stay in touch with those.

For example, prior to Challenger, the shuttle was declared as an operational system. As a result of that and the belief and the expectation it would be routine, citizens were allowed to be taken on for ride. The people at the top of the organization apparently believed that it was not a risky technology and therefore it was safe to take along ordinary citizens. The engineers who were doing the daily work did not believe that it was -- I mean, they were aware of all the problems in the system on a day-to-day basis. They were the ones who had the dirty hands. They were not the ones who made the decision to put a teacher on the space shuttle.

Another aspect of concern for top leaders is changes are often made in an organization's structure for budgetary reasons, for better coordination, without thinking about how that might affect the people who are having to make decisions at the bottom. What does it mean, for example, when you have an International Space Station and NASA is now dividing up the work so that there are two combined structures and projects in which decisions have to be made? How are these priorities getting sorted out? Does that affect what's going on in the program?

Contracting-out had a serious effect on the work of people making technical risk analyses. We know hospitals, when they have mergers, often let people go and it loses the institutional memory and there are startup costs in people getting going again. These kinds of changes should not be made without looking at their implications.

Second. Please, next slide. Target culture. You can't really make assumptions about your culture. We think we understand our cultures, but they act invisibly on us, and so we cannot really identify what their effects are. In one of the comments post Columbia concerning the E-mails, "We have a safety culture and we strongly encourage everyone to speak up at every opportunity." And I'm sure that they believe that. But when you look at the chronology of events, even in skeletal form in which I'm aware of them, the fact that these what-ifs didn't percolate up the hierarchy, the fact that the engineering requests did not get fulfilled indicates that there are some things that suppress or that are acting to suppress information.

It's also significant, I think, in terms of culture to understand the power of rules. The things that we put in organizations that do good also can have a dark side. It is really important at NASA, because of the complexity of the agency and its projects, to have rules. You couldn't run it without rules. It's impossible. But then there are times when maybe the normal rules don't apply. So how do you train people to recognize circumstances when you have to expedite matters without going through the hierarchy, and how do you empower engineers to get their requests filled?

Finally, targeting signals. Missing signals are obvious in both cases. What does it mean to try to reduce missing signals? One is to truly create a system in which engineers have more visibility, their concerns have more visibility on a formal and informal basis. Second, the safety system. The parallel with Challenger and the reduction of safety personnel is also a parallel with Columbia. When you reduce a safety system, you reduce the possibility that other people are going to be able to identify something that insiders have seen and normalized the technical deviation. And the slippery slope. When you're working in a situation where problems are expected, you have problems every day, and people are busy with daily engineering decisions, it becomes very difficult to identify and stay in touch with the big picture.

How do you identify the trend so that people are aware when they are gradually increasing the bounds of acceptable risk? It is certainly true, based on what we know about organizations and accidents in the social sciences that this is a risky system and what we know is the greater the complexity of the organization, the greater the possibility of a failure.

The same is true of organizations. Organizations are also complex systems. The greater the complexity of the organizational system, also the greater the possibility of a failure. When you have a complex organization working a complex technology, you're never going to be able to completely prevent accidents, but the idea is to be more fully aware of the connection between the two so that you can reduce the probability that a failure would occur.

That's it. Your turn.

ADM. GEHMAN: All right. Well, that's a bucket full.

Since you studied the Challenger decision so carefully, and even though we're talking about Columbia here, let me ask a Challenger question, even though it's loaded because it has Columbia implications. Several things you said struck me, and they're related to each other. One is that you can't change the behavior unless you change the organization. You can change the people, but you're going to get the same outcome if the organization doesn't change. Yet in another place up there, you said beware of changing organizations, because of the law of unintended consequences. You've got to be real careful when you change organizations.

What do you make of the post-Challenger organizational changes that took place, particularly in the area of more centralization and program management oversight? What do you make of all of that?

DR. VAUGHAN: The changes that I am most familiar with are the ones related to launch decisions. That is that immediately following, they put an astronaut, former astronaut in charge of the final "go" outcome of the Flight Readiness Review procedure and they tried to integrate engineers, working engineers, into the flight readiness process more. I'd say that there is always a problem in organizations in providing the stability and the centralization needed to make decisions and make sure information gets to the top and providing the flexibility to respond to immediate demands; and without, you know, really studying this, I would say that what we know about Columbia is that flexibility, at least in a couple of circumstances, really wasn't there. That becomes interesting in thinking about the differences in the pre-launch decision-making structure and post-launch decision-making structure. That is, the post-launch decision-making structure is actually designed to create that kind of flexibility so that you could pull in people as you need it and so on.

What's ironic about it is it looks as if had there been either a direct route for engineering concerns to get implemented to shortcut what really little bureaucracy there seemed to be in that process that that would have helped, that if, you know, that could have circumvented the kind of need for hierarchical requests for imaging. In terms of the overall impact on NASA, I really can't say that.

ADM. GEHMAN: From my understanding, though, one of the post-Challenger results has been a much more formal FRR process. As you are probably aware, no more telephone calls, it's all face-to-face, it's done at the Cape, and you've got to be there and they're done in big rooms like this with hundreds of people in the room with several different layers, everybody there, and there's a whole lot of signing that goes on. People at several layers actually sign pieces of paper that say, of the thousands of things that I'm responsible for, they've all been done with the exception of A, B, C, and D, and then they have to be waived or something like that. Then they go through a many, many hour process of making sure that everything's been taken care of and every waiver has been carefully analyzed and in front of lots of high-level people. So it's very meticulous, it's very formal, and it's an eyeball-to-eyeball commitment that my organization has done everything my organization is supposed to have done.

Is that the kind of an organization in which weak and mixed signals can emerge? I mean, is that the kind of organization which would recognize mixed and weak signals and routine signals? Is that compatible kind of with your -- I'm still talking Challenger -- with some of the principles you outlined here?

DR. VAUGHAN: This was fairly much the procedure that existed at the time of Challenger, where every layer of Flight Readiness Review had to sign off on it. The criticism at the time, post Challenger, was that what was happening was the engineers who were making the analyses and coming forward at the Level 4, the ground level of Flight Readiness Review, those were the people who were getting the mixed, weak, and routine signals; but when they came together, they had to come up with a consensus position for their project manager to carry forward. And once they agreed, then they began gathering the supportive data that this was an acceptable flight risk. And as their recommendation worked itself up through the hierarchy, the system was designed to criticize it, to bring in people with other specializations who could pick it apart, and the result of that was to make them go back to the desk and sometimes to do more engineering analysis. That engineering analysis tended always to support the initial recommendation. So by the time it came out the top of the process, it was something that might have been more amorphous on a day-to-day basis was dogma and very convincing, which is why, with a backdrop of having that kind of information, you have people who believe in acceptable risk, it's based on solid engineering and history, who need to be convinced by hard data that something different is happening this time.

The system is designed to review decisions that have been made, that if there is a mistake in the fundamental engineering analysis, they can criticize it, but they can't uncover it at the other layers, which would mean that you would need another kind of system to detect that, such as outsiders who bring fresh eyes to a project on a regular basis. The Aerospace Safety and Advisory Panel was very effective during the years of Challenger, with the exception of the fact that their charter kept them coming for visits perhaps 30 times a year. So it was impossible for them to track all the problems; and at that point when Challenger happened, they were not aware of the O-ring erosion and the pattern that was going on.

ADM. GEHMAN: I'm still trying to understand the principles here. It seems to me that in a very, very large, complex organization like NASA is, with a very, very risky mission, some decisions have to be taken at middle-management levels. I mean, not every decision and not every problem can be raised up to the top and there must be a process by which the Level 2, Level 3, and Level 4, that the decisions are taken, minority views are listened to, competent engineers weigh these things, and then they take a deep breath and say, okay, we've heard you, now we're going to move on. Then they report up that they've done their due diligence, you might say.

I'm struggling to find a model, an organizational model in my head, when you've got literally thousands and thousands of these decisions to make, that you can keep bumping them up higher in the organization with the expectation that people up higher in the organization are better positioned to make engineering decisions than the engineers. I mean, you said yourself, "Hindsight is perfect." We've got to be really careful about hindsight, and I'm trying to figure out what principles to apply.

We as a board are certainly skittish about making organizational changes to a very complex organization for fear of invoking the law of unintended consequences. So I need to understand the principles and I'm trying to figure out a way that I can apply your very useful analysis here and apply it to find a way to figure out what the principles are we ought to apply to this case. So the part that I'm hung up on right now is how else can you resolve literally thousands of engineering issues except in a hierarchical manner in which some manager, he has 125 of these and he's sorted through them and he reports to his boss that his 125 are under control. I don't know how to do that.

DR. VAUGHAN: Well, two things. First, somehow or other in the shuttle program, there is a process by which, when a design doesn't predict an anomaly, it can be accepted. That seems to me to be a critical point, that if this is not supposed to be happening, why are we getting hundreds of debris hits, if it wasn't supposed to happen at all. It's certainly true that in a program where technical problems are normal, you have to set priorities; but if there is no design flaw predicted, then having a problem should itself be a warning sign, not something that is taken for granted.

The idea is to spot little mistakes so that they don't turn into big catastrophes, which means spotting them early on. Two things. And one I'm certain that NASA -- maybe both of them -- that NASA may be very aware of is the fact that engineers' concerns need to be dealt with. I can understand the requirement for hard data. But what about the more intuitive kinds of arguments? If people feel disempowered because they've got a hunch or an intuition and let somebody else handle it because they feel like they're going to be chastised for arguing on the basis of what at NASA is considered subjective information, then they're not going to speak up. So there need to be channels that assure that, even giving engineers special powers if that's what's necessary.

The other is the idea of giving more clout to the safety people to surface problems. So, for example, what if the safety people, instead of just having oversight, were producing data on their own, tracking problems to the project for which they're assigned and, in fact, doing a trend analysis to keep people's eye on the big picture so that the slippery slope is avoided?

ADM. GEHMAN: Thank you for that.

DR. VAUGHAN: Let me add also that there are other models of organizations that deal with risky systems, and social scientists have been studying these. They have been, you know, analyzing aircraft carrier flight decks and nuclear operations and coal-mining disasters. There are all kinds of case studies out there and people who are working in policy to try to see what works and what doesn't work. Are there lessons from air traffic control that can be applied to the space shuttle program? What carries over? Is there any evidence that NASA has been looking at other models to see what might work with their own system?

I know that in air traffic control they use an organizational learning model. What we find out from this comparison between Columbia and Challenger is that NASA as an organization did not learn from its previous mistakes and it did not properly address all of the factors that the presidential commission identified. So they need to reach out and get more information and look at other models, as well.

Thinking about how you might restructure the post-launch decision-making process so that what appears to have happened in Columbia didn't happen again, how can that be made efficient, may be something -- maybe it needs to look more like the pre-launch decision process. But is there any evidence that NASA has really played with alternative models? And my point about organization structure is as organizations grow and change, you have to change the structures, but don't do it without thinking about what the consequences might be on the ground.

DR. LOGSDON: Just a short follow-up to that. Diane, your book came out in 1996, I think, right, and was fairly widely reviewed. We at the board discovered in some of our briefings from outside folks that the submarine safety program uses your work as part of the training program for people that worry about keeping submarines safe. Have you had any interactions with NASA since the book came out?


DR. LOGSDON: Have you ever been invited to talk to a NASA training program or engage in any of the things that you just discussed might be brought to bear?

DR. VAUGHAN: No, though, in fact, as you said, the book did get quite a lot of publicity. I heard from many organizations that were concerned with reducing risk and reducing error and mistake. The U.S. Forest Service called, and I spoke to hotshots and smoke-jumpers. I went to a conference the physicians held, looking at errors in hospitals. I was called by people working in nuclear regulatory operations. Regular businesses, where it wasn't risky in the sense that human lives were at cost. Everybody called. My high school boyfriend called. But NASA never called.


ADM. GEHMAN: Anybody want to comment on that?

GEN. BARRY: What was his name?

ADM. GEHMAN: Let me finish my thought here. Professor Vaughan, again we're back to this organizational issue which I'm trying to determine the principles that I can apply from your analytical work here. If the processes we're talking about in the case of NASA, if they didn't follow their own rules, would that alarm you? What I mean is if there were waivers or in-flight anomalies or systems that didn't work the way they were supposed to work and, in the fact that they didn't work the way they were supposed to work, somehow started migrating its way down lower in the message category to where it wasn't sending messages anymore and therefore it was technically violating their own rules because they're supposed to deal with these things, would that be a significant alarm for you?

DR. VAUGHAN: Well, I think that one of the things to think about here is that NASA is a system that operates by rules; and maybe one of the ways to fix the problem is to create rules to solve the problems. So what are the rules when engineers need images, for example? Can't they find a way where they have their own authority, without seeking other authority, to get the necessary images? So I think I read that someplace, where the harmony between the way the organization operates and thinks in the key aspects of the culture itself are something that you might want to build on.

DR. WIDNALL: Actually I'm starting to frame in my own mind that the problem is that there is, in fact, one underlying rule and it's a powerful rule and it's not stated and it's not stated as simply as this question of following your own procedural rules. But let me sort of get into that. I've certainly found your framework very helpful because I've mused over this issue of how an organization that states that safety is its No. 1 mission can apparently transition from a situation where it's necessary to prove that it's safe to fly, to one in which apparently you have to prove that it's not safe to fly. I think what's happening is, in fact, that engineers are following the rules but this underlying rule is that you have to have the numbers.


DR. WIDNALL: That's not the rule you stated, which was that you should follow the procedures and resolve all anomalies.

DR. VAUGHAN: This is a norm.

DR. WIDNALL: Those are these kind of rules. I'm talking about the really basic rule that says you have to have the numbers. So that basically means that every flight becomes data and that concern about an anomaly is not data. So a flight with an anomaly becomes data that says it's safe to fly. So the accumulation of that data, of those successful flights, puts the thumb on the scale that says it's safe to fly; and people who have concerns about situations in one of these uncertain situations that you talk about, they don't have the data.

So I think it may be getting at, in some sense, changing the rule to one that it is not okay to continue to operate with anomalies, that the underlying rule of just having data is not sufficient to run an organization that deals with risky technologies. Because otherwise you're just going to end up with a pile of data that says it's okay to fly, and you're not likely to get much data on the other side.

ADM. GEHMAN: Is that a question?

DR. WIDNALL: That's kind of a comment.

DR. VAUGHAN: I completely agree with you. One of the reasons I emphasized in an earlier slide that you need to understand your culture is that it works in ways that we don't really realize. So how many people there understand the effect of intuition and hunch, which are absolutely integral to good engineering, and how the kind of impression on numbers suppress that kind of information in critical situations?

People are disempowered from speaking up, by the very norms of the organization. Things like language, though. For example, the term I've read in the paper, "That's in family." That's a real friendly way of talking about something that's not really supposed to be happening in the first place. In nuclear submarines, they don't talk about it as "in family"; they talk about it as a degradation of specification requirements, which has a negative feeling to it. These kinds of languages which we think of as habits of mind reflect attitudes that are invisible, but the language really shows.

So the question is, you know, how can you get back in touch with the importance of engineering intuition and hunch in formal decision-making. Usually it works in the informal decision. You know, I think that's why the NASA administrators believe that they've got a safety culture and that people are free to express whatever they think; but when it comes to a formal decision, they fall back into the formal rules and that expression of concern doesn't get expressed.

Even if you take something as simple as an engineering presentation, the fact that it's reduced to charts, which are systematic, gets all the emotion out of it. It begins to look even more routine. The engineer in Challenger who saw the burned grease, the black grease, was seriously alarmed. I asked him, you know, later, "Did they see this? What did they see? Did they get a photograph?" He said yes. I said, "How did it look in the photograph?" He said it did not look serious in the photograph. So emotion is keyed to some kind of a logic based in engineering experience, and it should be valued and a way found to express it.

GEN. BARRY: Diane, I'm going to ask you a short question, and then I'm going to ask a longer question, if I may. First, the short question, focusing on organizational failure. The Rogers Commission, did they fall short on institutional recommendations in the aftermath of Challenger, or were they good ones and they just weren't followed through by NASA?

DR. VAUGHAN: The Rogers Commission was very good at identifying what they called contributing causes and that I would call system causes. That is, they identified safety cuts, cuts in safety personnel. They identified the failure of NASA to respond to recommendations of the Aerospace Safety and Advisory Panel. They identified the history of the program and the fact that it was a design that was built on budget compromises in the beginning. They identified production pressures. They identified all those kinds of outside sources that had impacted the decision-making and that were a part of NASA's history.

In the recommendations, they didn't come forward with anything that said give them more money, change the culture. They weren't sociologists. They weren't social scientists and not trained to think about how that might have actually worked. The way it looked like it worked was in the sense that there were pressures there and key managers, namely Lawrence Malloy, who was the project manager for the solid rocket booster project at that time, was the operator who made the error. Once that happened and the key person was identified and people changed and new people came in, then the system problems remained.

They fixed the technology. They fixed the decision-making structure in ways I described earlier. But the organization didn't respond and neither did -- in keeping with my point earlier about top leaders being responsible -- the organization did not respond in terms of getting more money beyond what it took at that point to fix the technical problem. They got an initial boost, but they've been under budgetary constraints all along. The recommendations in the volume of the presidential commission were related strictly to internal NASA operations. They were not directed towards policy-making decisions that might have affected the program.

GEN. BARRY: Okay. Let me build on that a little bit and just carry it on and see if this resonates with you. I'm going to list off a bunch of items here and see if this falls true with what you know to be from Challenger that might be able to be translated over to Columbia.

First of all, you stated that with Level 4 identifying problems and being able to try to communicate that up the institution, the organization kind of stymied that. So I would characterize that as needing to prove that there is a problem in the early stages of the FRR or before flight. I think post Challenger, you know, there has been a fix on that and, remember, the Flight Readiness Review is supposed to prove not only launch but also en route and on recovery. So it's the whole flight. It seems like they've solved the problem on trying to say is there a problem in proving it. To post launch. There's, some would argue, an attitude that you have to prove there is a problem. So we kind of fix it on the launch side; but after it's launched, we kind of relegate back to maybe the way it was prior to Challenger: Prove to me there is problem.

Now, if we try to look pre and post launch, pre-launch is very formal, as Admiral Gehman outlined earlier. You've even alluded to it in the book. Post-launch, it could be argued, less formal, more decentralization, more delegation certainly, okay, from what we see at the FRR prior to launch. Multi-centers are involved prior to launch. I mean, they all meet and they all sit at the same place, they're all eyeball to eyeball. Center director is represented, program managers. Post-launch, again decentralized, it's mostly a JSC operation. Of course, KSC gets involved within the landing at Kennedy.

There's a tyranny of analysis pre-launch maybe and that is because you've got -- well, you have a long-term focus because you've had time. But post-launch, there's a tyranny of analysis but it's in real time because you don't have as many hours and you've got to make decisions quicker and all that other stuff.

The real question -- if this resonates with you at all -- could it be argued that during Columbia, NASA had a "Prove there is no problem" prior to launch and post-launch it was "Prove to me there is a problem" and we have this formal and informal kind of focus. It seems to me after Challenger we fixed the prior to launch, certainly with having people appear in person and no VTCs or no over-the-phone. Everybody had to be there in person. And we have maybe a problem that we need to fix post-launch with the MMT and the decentralization elements and maybe the delegation.

I certainly don't want to relegate it to a headquarters level, but there are some things that need maybe to be fixed there. So I would ask really your opinion that is there some kind of a delineation in your mind, from what you know to date, pre and post launch, that we might be able to provide solid recommendations on to improve NASA?

DR. VAUGHAN: I'm wondering if the post-launch flexibility is such that you can, in fact, have similar things going on in two different parts of the process in which people are not in touch. So I understand that video requests really originated from two different points, working engineers in two different locations, and that they didn't really know that the other had originated a request.

It certainly seems that the mentality of proving your point when you've got a time line like you do and it's an unprecedented circumstance, as it was with Columbia, is wrong, of course, in retrospect. The question you're asking is how can we convert that into a process that prevents this from happening again.

No, a famous sociologist once told me when I was beginning the analysis of the Challenger launch, "It's all these numbers. It's all these numbers, and there are these debates about issues. Why don't you do it like they do it in the Air Force? You just should have a red button for stop and a green button for go." And there's a lot to be said for simplifying a complex system, whether it's decentralized or centralized, so that key people can respond quickly and shortcut the hierarchy. I don't know if that begins to answer your question. But there maybe need to be some more rules created in the sense that --

GEN. BARRY: And this is really stretching it but --

DR. VAUGHAN: Maybe it needs to be more formal than it is and maybe it needs to be more like the pre-launch procedure in terms of the rigor of numbers of people from different parts who are looking at problems that crop up while a mission is in process instead of waiting just -- I mean, some sort of a formalized procedure where there's a constant ongoing analysis instead of you've got worried engineers in two different locations who are kind of independently running around, trying to get recognized and get attention to the problem.

MR. WALLACE: NASA's taken quite a pounding here today but I'm wondering what we can --

DR. VAUGHAN: I thought this morning they were coming off pretty good.

MR. WALLACE: I would just like to talk about what we can sort of learn about what they do well -- in other words, areas where we don't seem to have this normalization of deviance or success-based optimism. Like BSTRA balls and the flow liner cracks and some of those fairly recent examples where there were serious problems detected with the equipment, in some cases detected because of extreme diligence by individual inspectors and really very aggressively and thoroughly fixed.

It seems to me that part of the problem of normalization of deviance is sometimes the level of visibility that an issue gets. How do you sort of the bridge that gap between those things that get enough visibility or sense of urgency and those that somehow seem to slip below that threshold?

DR. VAUGHAN: Someone said after the book was first published -- and then again now I've been getting a lot of E-mails. Someone said at the time the book was published, "I bet if you took any component part of the shuttle and traced it back, you would find this same thing going on." Perhaps doing a backward tracing on other parts of the shuttle could show you two things. First, what are the circumstances in which they're able to locate an anomaly early and fix it so they stop it in its tracks and avoid an incremental descent into poor judgment? Are there other circumstances in which the same thing is happening? Can you find circumstances where you do have the normalization of deviance going on?

It's interesting in the history of the solid rocket booster project that there was a point at which they stood down for maybe two months to fix a problem. How is that problem identified? What are its characteristics? I would bet that the more uncertain, the more complex the part and the more amorphous the indications, the more likely it is to project into a normalization-of-deviance problem, given the existing culture where flying with flaws is okay in the first place.

MR. WALLACE: Well, sort of following on. Earlier you said -- and good advice for this board -- that we should try to see problems as they saw them at the time and not engage in the hindsight fallacy or whatever that's called. I mean, I'm not sure you said this; but my assumption is that that's almost the only way you can learn to do better prospectively. I mean, do you have any other thoughts on that? In other words, to see the problem as they saw them at the time, to me, is almost a step toward the discipline of seeing the next one coming.

DR. VAUGHAN: Right. It's an experimental technology still; and every time they launch a vehicle, they've made changes. So they're never launching the same one, even though it bears the same name. This is a situation in which, like most engineering concerns where you're working with complex technologies, you're learning by mistake. So that's why post-flight analysis is so important. You learn by the things that go wrong. Every once in a while you're going to have a bad mistake.

ADM. GEHMAN: Did I understand the point that you made both in your book and in your presentation here is that the answer to perhaps Mr. Wallace's question lies in the theory of strong signals? In other words, if NASA gets a strong signal, they act on it. No problem. They very aggressively shut the program down and go fix it. The problem is in the weak, routine, or mixed signals. Those are the ones that seem to bite us. Of course, there are a lot of them; and they don't quite resonate with the organization. Is that a good analogy?

DR. VAUGHAN: It is. The idea of a trend analysis is that it could pick out stronger signals from lesser ones before it becomes, you know, an enormous problem; but the recognition of the pattern is important, bringing forth the pattern so that the people who are making decisions are constantly in touch with the history of decisions that they've gone through before.

I have to say with that, though, it's important that they have quantitative evidence to fly. Maybe the more qualitative evidence could be brought in in other ways further up the chain, that whereas in Flight Readiness Review, for example, they present everything on charts and they ask -- the purpose of Flight Readiness Review is to clean the hardware and get it ready to go. The purpose of it is to clear up the problems as it works its way through the Flight Readiness Review process. What happens, as I mentioned, is that the engineering argument tends to get tighter and tighter because they're constantly doing the work to investigate and respond to questions and, in a sense, defend what they've said or find out if there are flaws.

At the time of Challenger, I read thousands of engineering documents for all the Flight Readiness Reviews that they had had and I didn't see anyplace in the Flight Readiness Review process that would allow for the presentation of simply intuitions, hunches, and concerns, where qualitative evidence might be presented, like a clear image or even a vague image of a piece of debris the size of a Coke cooler, for example, rather than charts for an engineering analysis, you know, that there ought to be room in the process for alarm.

ADM. GEHMAN: In your experience, particularly with what I'm calling these weak signals or this muttering around the room that the O-rings can't take freezing temperatures but we're not really sure whether they can or cannot, I have in my mind a model that says that it's unfair or not reasonable to set as a standard for the organization to act on literally hundreds of these misgivings that the tens of thousands of people may have and that it's an unfair standard to require the people who have these doubts to prove that their doubt could cause the loss of the vehicle or the crew. But I have in my mind that it's a more reasonable standard that management should realize that the accumulation of signals from the process are cutting into their safety margins and that you can accumulate these things not in a measurable way but in a subjective way, particularly in a regime in which you have very thin safety margins to begin with, that you should be able to reasonably determine that you're narrowing your safety margins in a way that should concern management. Is that a reasonable characterization of the standard or the bar that we set here?

DR. VAUGHAN: I think that shows up in the problem of lack of data in both of these circumstances, that there were early warning signs and in neither case had those early warning signs been pursued and say, "Well, the imagery is bad. We know this is happening. We can't see exactly where it's hitting. Why don't we get this now?"

I mean the power of the E-mail exchange was that they really hadn't thought the possibility of failure through. There was no plan for what needed to happen if there was, in fact, a serious tile hit and damage to the wing, what would they do at re-entry and what would it mean to attempt a wheels-up landing at the landing site, and that failure to pursue the trajectory of having a problem that's repeating. Like if you think about cost maybe in terms of if that's a factor in making issues a priority at NASA, which obviously it is anyplace -- you can't fix everything -- think of the cost if you simply don't have the data you need, which is, I think, the most stunning thing about the comparison of the two cases. At the time when conditions were highly uncertain, in neither case did they have the data; and having that background data is important.

ADM. GEHMAN: In your review of the Challenger decision, did you personally come to the conclusion that the launch decision would have come out differently if the Morton Thiokol engineers' split decision -- because some of the Morton Thiokol engineers said it was safe to launch, but they were split on that -- and if the managers at Marshall had reported that there was a split decision, that the FRR would have come out differently? Did you have any evidence of that?

DR. VAUGHAN: The manager at Marshall did not know that there was a disagreement at Thiokol. That was one of the problems with them being in three locations. No one ever thought to poll the delegation. So no one on the teleconference knew really where anyone else stood. They knew what Thiokol's final recommendation was and they assumed that Thiokol had gone back and re-analyzed their data, seen the flaws in it, and been convinced it was safe to fly. So the fact that not every one was heard from was critically important.

By the same token, Thiokol engineers didn't understand that they had support in the other places, that one of the NASA managers who was at the Cape was really sitting there making a list of people to call because he believed that the launch was going to be stopped. So that was truly a problem.

Now I've lost sight of your question.

ADM. GEHMAN: The question was: In your research about Marshall, did you come to the personal conclusion from talking to people that the fact that the cold temperature analysis at Morton Thiokol was a split decision, that that would have made any difference at Marshall? I mean, did anybody say, "If I had known that, I would have changed my mind"?

DR. VAUGHAN: Yes. However, the goal is for unanimity and here's again where numbers count, that in the instance where engineering opinion is divided, then they make what's known as a management risk decision, that the managers take over and the managers at Thiokol then, who knew that their engineers were split, made a management decision. In retrospect, that was the most horrendous example of failing to listen to your technical people who said, "You know, I can't prove it, but I know it's away from goodness in our data base."

ADM. GEHMAN: This principle that I'm following up on here is important because we do have to be careful of hindsight; and it may be that, even armed with what is admittedly a minority opinion of a bad outcome, it could be that these are judgment calls that are made in good faith with people doing the best they can and they make a mistake. I mean, they call it wrong. So the question is whether or not we can indict the system, based on these incidents.

DR. VAUGHAN: I think you have to analyze -- you have to do a social fault tree analysis and figure out what actually happened and what went on, how is information relayed. I'm sure that's work that's ongoing with you.

ADM. GEHMAN: That brings me to my next question -- and pardon me for monopolizing the time here. Another good writer on this subject, who I think is Nancy Leveson, in one of her models she suggested that we need to diagram these decision-making systems because, just as you say, it's not a person, it's a culture, it's an organization that's really driving these things. Are you aware that anybody's ever diagramed the FRR or the waiver, in-flight anomaly disposition system? Has that ever been diagramed, to your knowledge?

DR. VAUGHAN: Not that I know of. But what would be more interesting would be to look at the more informal decision-making processes because the rules are so strong for how the information is addressed in Flight Readiness Review that that would probably turn out the same every time. What you would want to look at are the more informal processes and try to map them and understand where the information stopped and why it stopped.

MR. WALLACE: I'd like your thoughts on the concept of whether an organization, this one, can sort of become process-bound. You cannot fault the thoroughness of the processes. But, I mean, is there a point at which they can almost subvert other thinking processes, that people become so confident in the thoroughness of the processes and the fact that they're tested, they reach a comfort level with processes where they become the be-all and end-all?

DR. VAUGHAN: Well, that's one of my main concerns about NASA, that the fact that it is a very rule-guided organization and the fact that they do believe that when they follow all the rules that they have done their best and have confidence. That's why the rules tend to carry such heavy weight. Not only do they aid them with the process but then they have a cultural effect which builds confidence. If you're not in touch with the everyday engineering data itself, you can lose sight of the fact that it is still an experimental system. So it's the dark side of the organization. The same kinds of procedures that you implement to make it work better also can have an unanticipated consequence, and that's why keeping in touch with all the ambiguities in the engineering decision-making would be important.

Any other doubts and concerns? You know, by the time you get to the top of the Flight Readiness Review process, nobody's going to say that. One of the proposals from the presidential commission was that an engineer accompany his project manager at each level of the Flight Readiness Review tier, the feeling that because engineering concerns did not get carried up to the top prior to Challenger and in the eve-of-launch teleconference, they thought that would be a good idea. Rather than the engineers at Level 4 turning over all their information to their project manager and then the project manager carries it forward, let's integrate engineers into the process. But can you imagine some engineer in the top Level 1 Flight Readiness Review with 150 people, after all that's gone on, standing up and saying, "I don't feel good about this one"?

ADM. GEHMAN: Well, I agree with you. I agree with you. But I would compound that with an organizational scheme in which even though that engineer works in the engineering department and technically doesn't work in the program office but his position and his salary is funded by the program office and he wouldn't exist if the program office didn't pay him. In other words, we've wickered this thing to where the money flows down through the projects and they send money over to the engineering office to hire people. So now put yourself in the position of this guy who's going to contradict the officer who's paying his salary, and you don't have a very comfortable formula.

DR. VAUGHAN: I understand that. I think there's a parallel situation with safety people.

ADM. GEHMAN: Well, yes and no. There is a safety organization in the programs and in the projects and their positions depend upon the largesse of the project managers, but there's also an independent safety organization.

DR. VAUGHAN: I meant in terms of rank. Like independent authority and power based by where they come in the GS ranking system.

ADM. GEHMAN: Absolutely. That's a question I'm going to ask you after General Barry and Dr. Logsdon have a chance.

DR. LOGSDON: I have a comment that's as much directed at the board as it is at Professor Vaughan. It's just that this discussion made me think of this line of reasoning. We've been talking about the rigor of the pre-flight process for readiness review, compared to a different structure for what goes on during a mission. There's almost a symbolic element here. The management of the launch is a Kennedy Space Center responsibility; and the moment that the shuttle clears the launch tower, the control over the mission shifts to Johnson. Sean O'Keefe is trying to say that NASA is a single organization, but he's got a long way to go to achieve that goal. These are very proud organizations and, of those, Johnson is the very proudest of the proud because it's one of the only two places in the world that knows how to manage a space flight. There are now -- what's it, '61 -- so 42 years of experience of managing humans in space.

So we're beginning to talk about maybe we can examine the process of mission management and see whether it measures up to some standard of high-performance organizations, and I think that's what we have to do. But there's a lot of received wisdom and maybe it's ossified wisdom by this point in the process. So as we go towards that, I think we have to make sure that we don't have unintended consequences. So, I say, that's just a comment, not a question.

ADM. GEHMAN: Would you like to comment on his comment?

DR. VAUGHAN: Well, he directed that to the board, as well.

ADM. GEHMAN: In the interest of time, I'll go on to General Barry.

GEN. BARRY: I'd just like to add one more thing to your parallel kind of discussion between Challenger and Columbia. Could you just see if there's anything you know of that you could add to this kind of construct? You know, there was a lot of organizational changes here in the last couple of years. We moved Palmdale to Kennedy. We moved the Huntington Beach engineering support mostly to JSC but some to KSC. And, of course, we've got the International Space Station support going on. So there's some organizational elements that are unique to Columbia this time; but there are some Challenger organizational elements, too. You know, the JSC leadership was being shared by Jesse Moore at the time between JSC but he was also running the space flight program as associate administrator. Also, we had an interim administrator at the time during Challenger. Are there any parallels that you're seeing between the organizational aspects between Columbia and Challenger?

DR. VAUGHAN: At the administrative level?

GEN. BARRY: Well, just organizational elements that we might be able to draw from.

DR. VAUGHAN: One, but it's cultural. It seems like there is a gap between perceptions of risk between working engineers and top administrators. So at the time of Challenger, engineers were very concerned with every launch, even though they had gone through all the rigors of the procedure; but at the same time, the people at the top thought it was an operational system. The parallel I see is, you know, working engineers really familiar with what's going on and having concerns, but decisions made that really do echo the period of Challenger where it's okay to take citizens along for a ride, which suggests that top-level administrators have rather lost touch with the fact that it is an experimental system, a message that they clearly understood post Challenger.

John mentioned symbolic meanings, and they can be really important. It's hard to judge exactly what the effect is of a top administrator believing that it's again safe enough to fly people who are not trained as astronauts. Subtle things like "faster, cheaper, better" can have an effect on a culture, even at the same time that you're doing everything possible to encourage safety.

Certain actions have symbolic meaning. The fact that you have a safety representative sitting in on a Mission Management Team or in a particular wherever they're assigned can have symbolic meaning. Signs posted that it's safety, safety, safety can convince that you have a safety culture; and yet when you look at the way the organization works, you may not have as strong a safety culture as you wished. The safety person who is assigned to Mission Management Team decisions, if that is the case, is in a position of not having hands-on information and reviewing their decision but not, in a sense, dependent upon them because they have the leadership responsibility. So what kind of weight, you would want to know, is that person really bringing to that situation? Do they have the influence that they are listened to? Do they have the data to really do anything more than oversight at that point? How do you really put them in a position where they can recognize a warning sign and talk with people who are higher ranked than they are, in a definitive way, that is convincing in a crisis situation?

ADM. GEHMAN: That leads to my question. That is, would you be content -- let me just outline this in rough form -- of a process to satisfy that issue. That is, that senior management, the management who's got the ultimate responsibility in these decisions, that they would kind of be forced to listen to these engineering doubts because of an organization in which you had checks and balances among essentially coequal branches of some kind. In other words, that the engineers were organizationally and culturally equal to the project managers and the safety and mission assurance people were not only -- I agree with you. I understand exactly what you're saying. It's not good enough to just sit at the table. You have to come to the table with some clout and usually that clout's in the form of analysis or data or research or else I won't sign your chit for your money or something like that. You've got to come with something. And my model suggests that if you did that, you would be creating some degree of managerial chaos but, on the other hand, you would be making sure that engineering reservations and engineering concerns were well researched and got surfaced independently at the right level. So you've kind of got this trade-off between a little bit of managerial chaos, you would have the danger of the organization not speaking with one voice and all those kinds of things but, on the other hand, you would satisfy the requirement that signals would get heard.

DR. VAUGHAN: Surfaced.

ADM. GEHMAN: Does that sound reasonable?

DR. VAUGHAN: It does sound reasonable. Someone said if every engineer had every concern, you would probably never launch a mission; and that's probably true.

ADM. GEHMAN: Probably true.

DR. VAUGHAN: It seems in post-launch conditions where the clock is ticking, in line with Dr. Barry's suggestion about how could we restructure the post-launch decision process, that it would be especially important, then, to create that kind of an open process.

ADM. GEHMAN: Okay. Well, thank you very much, Dr. Vaughan. You've been very patient with us. We hope we haven't tried your patience too much as we try to understand the very sound principles that you have exposed us to, both in your book and in your briefing here today.

The board is sensitive about the law of unintended consequences, and we want to be very careful that we understand more about these managerial principles before we go writing something down on a piece of paper that we might regret. But your study has had an influence on this board and we're indebted to you for coming and helping us with here today.

DR. VAUGHAN: Thank you. Thanks for having me.

(Hearing concluded at 4:38 p.m.)

Back to April 23 Transcripts

Privacy Policy Disclaimer Accessibility