
|
As the result of ascent anomalies experienced on STS-93, I asked Dr. Henry McDonald (Center Director, NASA Ames Research Center), on September 7, 1999, to lead an independent technical team to review the Space Shuttle systems and maintenance practices. The team, comprised of NASA, contractor, and DOD personnel, looked at NASA practices, Space Shuttle anomalies, and civilian and military aeronautical experience. My goal for this study was to bring to Space Shuttle maintenance and operations processes a perspective from the best practices of the external aviation community, and where applicable/appropriate, apply these practices to the Space Shuttle. This report fully endorsed the continuation of Space Shuttle flights after disposition of the team's immediate recommendations. Additionally, the Space Shuttle Independent Assessment Team (SIAT) was continually impressed with the skill, dedication, commitment and concern for astronaut safety and the entire Space Shuttle workforce. The SIAT documented many positive elements during the course of their interviews with the Space Shuttle NASA/contractor workforce. Particularly noteworthy were the observations dealing with the skill and dedication of the workforce. Independent assessments, like the SIAT, have been used repeatedly throughout the history of the Space Shuttle program. NASA's goal for these independent assessments has been to identify opportunities to improve safety. The SIAT report will provide additional input to the full range of activities already underway associated with Space Shuttle upgrades, including maintainability, processes for shuttle safety and quality control. This report brings to Space Shuttle maintenance and operations processes a perspective from the best practices of the external aviation community. The SIAT focused their activities on eleven technical areas: Avionics, Human Factors, Hydraulics, Hypergols and Auxiliary Power Unit, Problem Reporting and Tracking Process, Propulsion, Risk Assessment and Management, Safety and Mission Assurance, Software, Structures, and Wiring. They documented 81 recommendations in four categories: • Four recommendations identified as Immediate (Solutions
required prior to return to flight). In this category were the following
recommendations:
The above recommendations were reviewed and dispositioned prior to the Flight Readiness Review for STS-103 (the first Shuttle flight following the stand-down of the orbiter fleet for wire inspections). • Thirty-seven recommendations identified as Short-Term (Solutions required prior to making more than four more flights) • Thirty recommendations identified as Intermediate (Solutions required prior to January 1, 2001) • Ten recommendations identified as Long-Term (Solutions required prior to January 1, 2005) The Office of Space Flight applauds the work and dedication of the SIAT on what is part of a continuing process to improve the safety of the Space Shuttle system. The Space Shuttle Program Office is outlining a plan to address all recommendations. It is expected that actions to be taken in response to this report will cover near-term and long-term strategies that will lead to the development and infusion of new technologies and practices. |
You will need Adobe Acrobat Reader 4.0 to
properly view this PDF document.
To download a free copy, visit the Adobe
website.
|
|
Working
hard to solve the wrong problem?
Bill says
that unless you really understand a problem, your solution may end up
creating additional problems, so if that's the case, it is often better
to just leave the situation alone.
Bill is right.
Each study is looking at a small piece of a much larger picture and
failing to see what the real problem is. By consuming precious
research funds and keeping us at risk over long time periods while
muddling around for piecemeal solutions, they are themselves part of the
problem.
We don't need another study, more wasted time and more solutions that
don't work, we just need to use a little common sense. We need to
rethink and apply what we already know:
*NFF = No Fault Found (aka intermittency)
1.) A cursory analysis of the problem from any view would
conclude that latent problems obviously exist that present testing
methods are unable to detect or diagnose. Proof of this is the
huge NFF rates that some researchers simply choose to ignore.
2.) Failure and repair data, show that these NFF rates generally
increase as a system ages.
3.) We also know that functional testing will verify that all of
the components, the resistors, capacitors, solid-state devices, etc. are
within specifications and are doing their job correctly, so we can
summarily eliminate them as the source of the trouble.
4.) That only leaves us with the connectivity elements, the
wires, connectors, solder-joints, etc, as the likely components that are
breaking down. These elements are intermittent by design to some
degree, so they usually conduct properly when new and somewhat less as
they age. As they wear out, they are first seen as intermittent
spurious noise that eventually progresses to randomly occurring NFF
failures and then into hard opens or hard shorts depending on whether
the conducting/contacting element or the insulation is wearing. This
is something every technician already knows.
5.) We know that we can slow this natural process down through
visual inspections, looking for design exceptions, and abuse such as insulation
chaffing with no protective sleeving. These inspection processes and
procedures have been in place for decades. We don't always follow
them, but we know how to do it.
Do we really need a study for this?
The aforementioned wiring study is focusing all of its efforts on
insulation failures while excluding failures in connectivity, due no
doubt to all the media hype concerning insulation fires and exploding
gas tanks. Shorting wires, due to insulation wear and aging, are
deemed by these researchers to be a high priority safety issue while
intermittent connectivity and NFF problems are seen simply as a matter
of maintenance economics. In their view, if a wire (insulation)
chaffs and shorts out, loosing a system or two, it's a safety issue, but
if a circuit opens up (even momentarily) and shuts down a system, it's
only a maintenance problem. In the real world, the opposite causes
can have the same effects, so from a safety or a maintenance view,
what's the difference?
The big thing that all these researchers overlook is that in the process
of aging, a period of random intermittency due to opening or shorting or
both, is generated as the failure progresses to a hard or testable
fault. This is part of the reason for the NFF problem. The
pilot sees the problem in an environmentally stressed situation while
ground-based testing is rather static.
Since these researchers can't measure it they can't quantify it. By
their definition then, intermittency doesn't exist and as a result,
their jobs just got a lot easier and more secure in the process.
As Bill pointed out, if you focus on the wrong problem you run the risk
of coming up with the wrong solution. If the problem is in the
wiring and deemed a safety problem and as a Government agency charged
with enforcing safety you are obligated to correct the problem, what are
you going to do? Are you going to totally rewire or replace all of
the legacy aircraft at the first sign of aging, about every 5 or 6
years? Or do you just train pilots to react to failures and
warnings in a "Don't worry about it, unless you smell smoke"
sort of a way?
Maybe they had better reconsider the testing option where you identify
wires as they fail and fix the shorting wires, the opening wires and the
intermittent/NFF wires all at the same time.
A previous Air Force Headquarters dictum in the early 90's got it right
when they concluded that their main problem was having too many
That statement is not true anymore!
CND/NFF is undetected random intermittency. Understanding what it
is, linked with a couple of key technological advancements, equipment is
now available that can test for the failure mechanism directly and at
such high levels of sensitivity that it can detect the signs of aging,
even before they cause system failures. With a minimal investment
in time and money, high levels of reliability and trust can be restored
to these *legacy systems.
What these researchers need to study, if we need a study at all, is
simply: What systems need intermittency testing first?
Why is this NFF testing issue so hard to understand? How
do we sell a turn-around strategy based on better testing and better
inspections to the underwriters, insurance companies and component
manufacturers in the process of fleeing this business?
*legacy systems = old and aging
Paradigm Lock:
A friend sent me this letter on the reasons for the difficulty in introducing any new ideas or technology.
Ted ???? wrote:
When Eli Goldratt first laid out the levels of resistance to change in the thinking process, there were five.
Level One
Denial that there is a problem.
Level Two
Denial that even if there is a problem there is a possible solution.
Level Three
Denial that a proposed solution could solve the problem.
Level Four
A belief that even though the problem is known and a possible solution has been found and that this solution could solve the problem, if it were implemented it would itself cause severe additional problems.
Level Five
Denial that even though the problem is agreed upon and a solution has been found for which no unwanted dysfunctional outcomes remain, there is a way to implement it.
This was the state of things up to the early 90s. It was gradually recognized that some company executives, even though they were brought through all five levels of change, still failed to take action. While there were cultural dimensions, the behavior (or its lack) was not uncommon. Level six was added to the classification system to explain this remaining resistance to change. It is:
Level Six
Sometimes a key individual is so afraid of change itself or finds his/her investment in the current situation and its rewards so critical to his/her well-being that the needed change cannot be accepted because of the personal risk involved. This is called the non-decision. "We have decided not to come to any decision on that for the time being" (i.e. until the issue is no longer an issue).
Ted examined a number of companies as part of his doctoral research and developed a generic cloud that describes this conflict and the inability of executives to break out of it. He termed it "paradigm lock".
Ted, whoever you are, thanks for the insight and the terminology.
Dr. Eli Goldratt's other work and articles can be seen at APICS magazine.
http://www.usynaptics.com/ Universal Synaptics