To be happy, we must not be too concerned with others.- Albert Camus

STATEMENT OF MR. JOSEPH H. ROTHENBERG

ASSOCIATE ADMINISTRATOR FOR SPACE FLIGHT


 

As the result of ascent anomalies experienced on STS-93, I asked Dr. Henry McDonald (Center Director, NASA Ames Research Center), on September 7, 1999, to lead an independent technical team to review the Space Shuttle systems and maintenance practices. The team, comprised of NASA, contractor, and DOD personnel, looked at NASA practices, Space Shuttle anomalies, and civilian and military aeronautical experience. My goal for this study was to bring to Space Shuttle maintenance and operations processes a perspective from the best practices of the external aviation community, and where applicable/appropriate, apply these practices to the Space Shuttle.

This report fully endorsed the continuation of Space Shuttle flights after disposition of the team's immediate recommendations. Additionally, the Space Shuttle Independent Assessment Team (SIAT) was continually impressed with the skill, dedication, commitment and concern for astronaut safety and the entire Space Shuttle workforce.

The SIAT documented many positive elements during the course of their interviews with the Space Shuttle NASA/contractor workforce. Particularly noteworthy were the observations dealing with the skill and dedication of the workforce.

Independent assessments, like the SIAT, have been used repeatedly throughout the history of the Space Shuttle program. NASA's goal for these independent assessments has been to identify opportunities to improve safety. The SIAT report will provide additional input to the full range of activities already underway associated with Space Shuttle upgrades, including maintainability, processes for shuttle safety and quality control. This report brings to Space Shuttle maintenance and operations processes a perspective from the best practices of the external aviation community.

The SIAT focused their activities on eleven technical areas: Avionics, Human Factors, Hydraulics, Hypergols and Auxiliary Power Unit, Problem Reporting and Tracking Process, Propulsion, Risk Assessment and Management, Safety and Mission Assurance, Software, Structures, and Wiring. They documented 81 recommendations in four categories:

• Four recommendations identified as Immediate (Solutions required prior to return to flight). In this category were the following recommendations:
  1. "The reliability of the wire visual inspection process should be quantified (success rate in locating wiring defects may be below 70% under ideal conditions)."
  2. "Wiring on the Orbiter Columbia [currently at Palmdale, CA, for its periodic down period for inspections and modifications] should be inspected for wiring damage in difficult-to-inspect regions. If any of the wires checked are determined to be especially vulnerable, they should be re-routed, protected, or replaced."
  3. "The 76 CRIT 1 areas should be reviewed to determine the risk of failure and ability to separate systems when considering wiring, connectors, electrical panels, and other electrical nexus points. Each area that violates system redundancy should require a program waiver that outlines risk and an approach for eliminating the condition. The analysis should assume arc propagation can occur and compromise the integrity of all affected circuits. Another concern is that over 20% of this wiring can not be inspected due to limited access; these violation areas should as a minimum, be inspected during heavy maintenance and ideally be corrected."
  4. "The SSP should review all waivers or deferred maintenance to verify that no compromise to safety or mission assurance has occurred."

The above recommendations were reviewed and dispositioned prior to the Flight Readiness Review for STS-103 (the first Shuttle  flight following the stand-down of the orbiter fleet for wire inspections).

• Thirty-seven recommendations identified as Short-Term (Solutions required prior to making more than four more flights)

• Thirty recommendations identified as Intermediate (Solutions required prior to January 1, 2001)

• Ten recommendations identified as Long-Term (Solutions required prior to January 1, 2005)

The Office of Space Flight applauds the work and dedication of the SIAT on what is part of a continuing process to improve the safety of the Space Shuttle system. The Space Shuttle Program Office is outlining a plan to address all recommendations. It is expected that actions to be taken in response to this report will cover near-term and long-term strategies that will lead to the development and infusion of new technologies and practices.


You will need Adobe Acrobat Reader 4.0 to properly view this PDF document.
To download a free copy, visit the Adobe website.

Right-Click here (to save) above  SIAT Report  (in PDF format)

EDITORIAL SECTION:

Finally, after a rash of accidents, computer glitches, launch delays and record-setting unreliability in various space, airline and defense industries along with a couple of embarrassing reports on these problems, such as the Space Shuttle Independent Assessment Team (SIAT),  http://www.hq.nasa.gov/osf/shuttle_assess.html   there is now a "feeding frenzy" to determine what to do about it, or at least to appear to be doing something about it.

The time to take action is way overdue; however, there seems to be little consensus among these various groups as to what the problem really is.  One group believes there is something inherently wrong with the wiring insulation while another believes that technicians and diagnostics are to blame.  An earlier study concluded the problem was weak management and another thinks it's premature to study the issues until you know all the costs (believe that!).


An editorial by Bill Schweber, Executive Editor of Electronic Design News magazine has some interesting comments t

Working hard to solve the wrong problem?
Bill says that unless you really understand a problem, your solution may end up creating additional problems, so if that's the case, it is often better to just leave the situation alone.

Bill is right.

Each study is looking at a small piece of a much larger picture and failing to see what the real problem is.  By consuming precious research funds and keeping us at risk over long time periods while muddling around for piecemeal solutions, they are themselves part of the problem.

We don't need another study, more wasted time and more solutions that don't work, we just need to use a little common sense.  We need to rethink and apply what we already know:

*NFF = No Fault Found (aka intermittency)


1.)   A cursory analysis of the problem from any view would conclude that latent problems obviously exist that present testing methods are unable to detect or diagnose.  Proof of this is the huge NFF rates that some researchers simply choose to ignore.

2.)  Failure and repair data, show that these NFF rates generally increase as a system ages.

3.)  We also know that functional testing will verify that all of the components, the resistors, capacitors, solid-state devices, etc. are within specifications and are doing their job correctly, so we can summarily eliminate them as the source of the trouble.

4.)   That only leaves us with the connectivity elements, the wires, connectors, solder-joints, etc, as the likely components that are breaking down.  These elements are intermittent by design to some degree, so they usually conduct properly when new and somewhat less as they age.  As they wear out, they are first seen as intermittent spurious noise that eventually progresses to randomly occurring NFF failures and then into hard opens or hard shorts depending on whether the conducting/contacting element or the insulation is wearing.  This is something every technician already knows.

5.)  We know that we can slow this natural process down through visual inspections, looking for design exceptions, and abuse such as insulation chaffing with no protective sleeving. These inspection processes and procedures have been in place for decades.  We don't always follow them, but we know how to do it.

Do we really need a study for this?

The aforementioned wiring study is focusing all of its efforts on insulation failures while excluding failures in connectivity, due no doubt to all the media hype concerning insulation fires and exploding gas tanks.  Shorting wires, due to insulation wear and aging, are deemed by these researchers to be a high priority safety issue while intermittent connectivity and NFF problems are seen simply as a matter of maintenance economics.  In their view, if a wire (insulation) chaffs and shorts out, loosing a system or two, it's a safety issue, but if a circuit opens up (even momentarily) and shuts down a system, it's only a maintenance problem.  In the real world, the opposite causes can have the same effects, so from a safety or a maintenance view, what's the difference?  If it's any kind of a wiring problem, shouldn't we just fix it?

The big thing that all these researchers overlook is that in the process of aging, a period of random intermittency due to opening or shorting or both, is generated as the failure progresses to a hard or testable fault.  This is part of the reason for the NFF problem.  The pilot sees the problem in an environmentally stressed situation while ground-based testing is rather static.  The other part of the NFF problem is that presently used test equipment is by design and purpose not able to see intermittent faults.

Since these researchers can't measure it they can't quantify it. By their definition then, intermittency doesn't exist and as a result, their jobs just got a lot easier and more secure in the process.

As Bill pointed out, if you focus on the wrong problem you run the risk of coming up with the wrong solution.  If the problem is in the wiring and deemed a safety problem and as a Government agency charged with enforcing safety you are obligated to correct the problem, what are you going to do?  Are you going to totally rewire or replace all of the legacy aircraft at the first sign of aging, about every 5 or 6 years?  Or do you just train pilots to react to failures and warnings in a "Don't worry about it, unless you smell smoke" sort of a way?

Maybe they had better reconsider the testing option where you identify wires as they fail and fix the shorting wires, the opening wires and the intermittent/NFF wires all at the same time.

A previous Air Force Headquarters dictum in the early 90's got it right when they concluded that their main problem was having too many
Can Not Duplicate (CND/NFF/NPF) problems, so they put out bids asking for someone in the testing business to build them a CND tester.  It turned out to be a little embarrassing for them when they were told no one knows what a CND tester is exactly: "If you can not duplicate it then you can't test for it, it's just the way testing works".

That statement is not true anymore!

CND/NFF is undetected random intermittency.  Understanding what it is, linked with a couple of key technological advancements, equipment is now available that can test for the failure mechanism directly and at such high levels of sensitivity that it can detect the signs of aging, even before they cause system failures.  With a minimal investment in time and money, high levels of reliability and trust can be restored to these *legacy systems.

What these researchers need to study, if we need a study at all, is simply:  What systems need intermittency testing first?   Why is this NFF testing issue so hard to understand?  How do we sell a turn-around strategy based on better testing and better inspections to the underwriters, insurance companies and component manufacturers in the process of fleeing this business?


*legacy systems = old and aging



Paradigm Lock:

A friend sent me this letter on the reasons for the difficulty in introducing any new ideas or technology.

Ted ???? wrote:

When Eli Goldratt first laid out the levels of resistance to change in the thinking process, there were five.

Level One
Denial that there is a problem.

Level Two
Denial that even if there is a problem there is a possible solution.

Level Three
Denial that a proposed solution could solve the problem.

Level Four
A belief that even though the problem is known and a possible solution has been found and that this solution could solve the problem,  if it were implemented it would itself cause severe additional problems.

Level Five
Denial that even though the problem is agreed upon and a solution has been found  for which no unwanted dysfunctional outcomes remain, there is a way to implement it.

This was the state of things up to the early 90s. It was gradually recognized that some company executives, even though they were brought through all five levels of change, still failed to take action. While there were cultural dimensions, the behavior (or its lack) was not uncommon. Level six was added to the classification system to explain this remaining resistance to change. It is:

Level Six
Sometimes a key individual is so afraid of change itself or finds his/her investment in the current situation and its rewards so critical to his/her well-being that the needed change cannot be accepted because of the personal risk involved. This is called the non-decision. "We have decided not to come to any decision on that for the time being" (i.e. until the issue is no longer an issue).

Ted examined a number of companies as part of his doctoral research and developed a generic cloud that describes this conflict and the inability of executives to break out of it. He termed it "paradigm lock".

Ted, whoever you are,  thanks for the insight and the terminology.

Dr. Eli Goldratt's other work and articles can be seen at APICS magazine.



For more information about the IFD-2000 NFF Analyzer visit the Universal Synaptics web site.



Purpose of this newsletter:

After decades of improvements in the reliability of electronic components, the failure mix has shifted considerably from individual replaceable components (hard failures), towards the connectivity elements (age-related intermittent failures).  While the failure mix has changed, testing has largely remained the same.  Many engineers, not understanding or considering the low-level mechanics/physics of measuring are still trying to fix intermittent failures with test equipment designed for hard failures only.

As a result of this mismatch, coupled with the proliferation of increased functionality and more wires and circuits, reliability has suffered.  Even with the increased environmental stress provided with HALT/HASS and other testing, NFF/aging-related failures still pass undetected due to the inability of the actual measuring equipment used to see or detect these randomly occurring intermittent failures.

Nowhere has this oversight in testing caused more serious safety and monetary consequences than in the aerospace industry with its environmental extremes and multi-level maintenance constraints.  We are now seeing No Fault Found rates as high as 60%, and wiring-related disasters seem to be occurring on a regular basis.  The White House and other agencies are now conducting investigations into the test engineers' domain, asking a lot of questions but getting few good answers.  

Universal Synaptics, after years of researching the root causes of NFF and aging failures, discovered why legacy/present testing systems were not detecting the failure mechanism responsible.  Sharing this knowledge of the phenomenon, along with others' ideas and insights, hopefully will help to bring about a successful reversal of this problem.

  Avionics Magazine warns of BOGUS components liability issue. http://www.aviationtoday.com/reports/avionics/06sysdesign.htm  
Maintenance practices along with avionics pieces/parts come under contributory scrutiny. Insurance companies and component manufacturers reassess their risks.  Many are bailing out of the avionics business.  A "must read" for anyone involved with avionics.

Aviation Maintenance Magazine discusses liability PROTECTION.
A new break-through tool, adds a new dimensionality to avionics maintenance and reliability. Unlike traditional testing that looks at the processed signals to determine functionality only, the IFD-2000 tests directly for the signs of aging, the previously untestable failure mechanism responsible for reliability, safety, and No Fault Found (NFF) issues.  

AviationNow Forum:  Aging Wiring; Are we doing enough?
Opinion on the efforts of certain government sponsored study efforts into the effects of aging on wiring and avionics.  Is the problem really with the insulation or the definition of the problem?

Sensors magazine explores connector lubrication to prevent fretting.
Lubrication manufacturer claims connector fretting can be reduced 1000 to 1.  How many hours of aircraft induced vibration are considered enough to wear out a connector?  It might be sooner than you think.  Read the article and then send us your thoughts.

(Note: Test results for this article used ohmic build-up criteria to determine efficiency of lubrication to prevent fretting, while the occurance of intermittencies in the micro-break range that usually precede fretting, were not considered and remain unknown at present time.  The article's authors are curious about the possibility that intermittencies could be occuring long before ohmic build-up becomes critical and are considering additional testing with equipment that monitors that particular failure mechanism.  We will try to keep you posted on the results.)


http://www.usynaptics.com/    Universal Synaptics

 

Legend -- a lie that has attained the dignity of age.- H. L. Mencken

I want to live forever and so far I'm doing ok! Return to HOT off the PRESS

Will Rogers, once said, "All of us are ignorant; just in different areas."