Figure
1
Introduction
Historically,
when an NDT engineer has been asked, "How well does your testing work?"
the reply began, "Well, I can find…" That scenario is rapidly changing
in the aerospace and nuclear industries through implementation of customer
driven requirements for test reliability demonstrations. As a result,
emphasis has shifted from concern about the size of discontinuity that
can be found to the size of discontinuity that can be missed. While
the difference between the two questions sounds subtle, radically different
approaches are required to answer both questions.
Test Reliability
A relatively simple capability study may be performed on a few realistic
specimens to determine what size discontinuity can be found. However,
a more sophisticated experiment using a larger quantity of specimens,
tested over a range of realistic production conditions, must be conducted
to estimate test reliability. This estimate is generally presented in
terms of probability of detection (POD) versus discontinuity size, thereby
quantifying test effectiveness. A confidence level is applied to the
estimate of POD to provide additional insight into the potential variability
of the test. In simple terms, a POD estimate of 90 percent would indicate
that nine out of ten discontinuities of a particular characteristic
would be found during testing. A confidence level estimate of 95 percent
would indicate that, if the entire experiment were repeated over and
over, 95 out of 100 repeat experiments would perform at least as well
as the quoted estimate.
Using
statistical techniques, test matrices are developed that exercise
all potential variables of each test.
In the aerospace industry, this concept
of test reliability gained attention with the introduction of the United
States Air Force Airframe Structural Integrity Program and the Engine
Structural Integrity Program (ENSIP), as well as NASA's standard,
MSFC-STD-1249 (1985). All of these comprehensive life management
programs insist upon a direct relationship between guaranteed life of
fracture critical hardware from assumed rogue discontinuities of certain
sizes and the contractor's ability to reliably detect and reject those
assumed discontinuities. Many arguments have taken place between contractor
and customer regarding the validity of these required demonstrations,
with most of the arguments focusing on the use of fatigue cracks to
assess manufacturing tests. Obviously, in most cases, virgin parts contain
no fatigue cracks. However, it has been the position of the customers
that some cracklike discontinuities do occur in new parts and they prefer
to work with a worst case scenario. As US Air Force and NASA look to
their respective overhaul requirements, they both want to be assured
that all critical hardware will be testable to the required crack sizes
at depot. The best way to assure that, barring inservice damage, is
to require the same critical tests to be performed by the contractor
at production.
A team was formed by the Air Force
to create a document that would provide guidance for all aspects of
reliability demonstrations. The document was recently published as
MIL-HBK-1823 (1999). The document is in wide circulation, primarily
because no similar set of detailed instructions exists. MIL-HBK-1823
(1999) includes detailed guidance pertaining to:
- specimen designs, quantities and
special care
- design of experiments and test
matrices
- potential test variables
- discontinuity size distributions
- test reports
- data analysis and presentation
- treatment of false calls.
Professional societies have initiated
efforts to create POD related standards but no private industry documents
are close to publication. At this point it makes sense for ASNT to take
a leadership role. An ad hoc committee has been assembled to produce
a position paper for the Society so that this leadership role may be
embraced with a full understanding.
Demonstration Design
Design of a reliability demonstration generally flows from a customer
requirement to quantify the effectiveness of special tests that are
performed on critical, life limiting hardware. For the aerospace community,
those tests may be ultrasonic testing, eddy current testing, fluorescent
penetrant testing, magnetic particle testing, radiographic testing and
other testing methods. Obviously, these various techniques are used
for different test scenarios but the worst case discontinuity of concern
is normally a tight, cracklike discontinuity in a stress concentrated
location. This is the condition that can quickly lead to premature inservice
failure. Therefore, most aerospace demonstration specimens are manufactured
to contain cracks, while only a few sets contain discontinuities such
as voids and inclusions. In other industries, weld discontinuities are
a much greater concern, so specimens can be tailored to address those
conditions.
The various tests of critical hardware
are grouped according to testing technique, surface versus subsurface
discontinuity, discontinuity size range, local testing geometry and
material. Specimens, in groups of 30 to 60 crack opportunities, are
then generated to simulate each scenario. This is the most expensive
aspect of the experiment, therefore strong emphasis is placed upon grouping
tests, minimizing numbers of specimens and selecting an optimal range
of discontinuity sizes.
The ideal discontinuity distribution
will have a few specimens on both horizontal sections of the anticipated
POD curve, with the bulk of the discontinuity sizes selected to be in
the transition region. Obviously, this can be a difficult guessing game,
so prior history from previous similar demonstrations is valuable.
Using statistical techniques, test
matrices are developed that exercise all potential variables of each
test. These may include redundant test systems, probes, dwell times,
concentrations of liquids and many other factors, including test personnel.
Honest interaction of testing engineers
with inspectors, including development of experimental procedures and
data recording devices, is critical in obtaining realistic results.
An estimate of NDT capability could be obtained by an engineer in the
laboratory, but an accurate estimate of NDT reliability can be obtained
only by evaluating the inspectors in a comfortable, familiar environment.
Test Anxiety
Although the stigma of testing may make the testing staff acutely alert,
a nervous tester is more likely to make a mistake. If the testing engineer
does not normally work with the inspectors at that facility, it is advisable
for the engineer to arrive early enough to spend some time getting to
know the inspectors and discussing the program objectives and procedures
before starting the tests. In addition, the tests procedure and data
recording device must closely simulate normal daily activity to assure
validity of results.
It is also important to provide feedback
to the testing staff when the data have been reduced and POD curves
generated. In most cases, anonymity of inspectors is assured, making
the reliability demonstration a system test, rather than a test of individuals.
However, an exception should be made in the case of techniques, which
are extremely tester dependent. Although this has been a topic of much
debate, it makes no sense to ignore the most critical variable of the
test.
MIL-HBK-1823 (1999) describes
two optional data analysis and presentation techniques: hit/miss
and a-hat versus a analysis. Hit/miss analysis relies
simply upon pass/fail information to provide an estimate of POD. It
is generally applied to tests whose outputs are nonquantitative and
it generally requires more data than the a-hat versus a process
to produce a statistically valid answer. An a-hat versus a
analysis makes use of the additional information that can be provided
in a quantitative output. The symbol a represents the true discontinuity
size, while the symbol a-hat indicated discontinuity size. Indicated
discontinuity size may simply be signal amplitude, rather than special
processing, in an attempt to provide a discontinuity dimension. In general,
a-hat versus a analysis provides a more optimistic estimate
of NDT reliability. Data analysis software may be obtained from Air
Force personnel at Wright-Patterson Air Force Base, Ohio.
Confidence levels of 50 percent (mean
curve) and 95 percent (two sigma lower bound) are most often quoted
when POD information is presented. NASA requires the use of the 95 percent
confidence level at all times when NDT reliability for fracture critical
components is discussed. However, Air Force standards such as the ENSIP
standard, MIL-STD-1783 (1997), allow the use of the 50 percent
confidence level curve for tests that are automated or semiautomated.
Obviously, these categories leave much room for debate, but the fact
is that any well controlled test produces a 95 percent confidence level
curve that is relatively close to the mean curve. Air Force program
managers often ask to review lower bound curves for automated systems
just to have another test for system stability.
Conclusion
In summary, the primary intent for an NDT reliability demonstration
is to produce a superior quantification of the effectiveness of a test
system, including basic system functions, environmental influences and
human factors. Compared to the more conventional practice of demonstrating
basic capability of an NDT method on a few artificial discontinuities,
this is important information for those of us who rely upon sophisticated
NDT methods to provide a final level of quality assurance for critical
hardware. Many preliminary demonstrations have produced results that
have led to modified test practices. Through those modifications and
appropriate controls, the NDT industry now has the opportunity to step
up to superior practices.
References
National Aeronautics and Space Administration, NASA Specifications
Marshall Space Flight, MSFC-SPEC-1249, NDE Guidelines and Requirements
for Fracture Control Programs, 1985.
US Air Force Aeronautical Systems
Center, Military Standard, MIL-STD-1783, Engine Structural Integrity
Program, 1999.
US Air Force Aeronautical Systems
Center, Military Handbook, MIL-HDBK-1823, Nondestructive Evaluation
System, Reliability Assessment, 1999.