Reliable Classification vs. Idiosyncratic Opinion: A Reply to Gardner
Terence W. Campbell*
Allow me to begin this reply to Gardner by specifying what I will address,
and what I will not. Until Gardner reads my articles outlining the rumor
model for assessing false allegations of sexual abuse (Campbell, 1992a, b), his
hypercritical but ill-informed comments are undeserving of any response.
To belabor the obvious, supporting my position regarding the serious
shortcomings of Gardner's "Indicators" does not require me to defend
my own model.
In his response to my original article, Gardner deserves credit for
recognizing his own facility for pedantic excess as he debated obscure issues of
grammatical protocol; and I have no desire to imitate him in that regard.
I seriously doubt that such pettiness really interests readers. Suffice to
say, my use of sic merely conforms to the stylistic requirements of any
APA-style journal including this particular publication. Perhaps I should
reassure Gardner that I used this same term in a previous
article that appeared in this very journal (Campbell, 1992c, p.120).
Given this information, Gardner might feel less singled out for criticism.
Disregarding Gardner's tone of self-righteous animus, and his related
penchant for ad hominem arguments, enables me to redirect attention to
the fundamental question raised by my original article: Can Gardner's
"Indicators of Pedophilia" be used in a sufficiently reliable manner
to support exert testimony in a forensic setting? My answer to this
question is obviously an emphatic "No," and surprisingly enough,
Gardner seems to agree with me more often than not. I would furthermore
insist that considerations of intellectual honesty and responsible scholarship
dictate that Gardner and I confine ourselves to issues that are directly
relevant to this question. To do otherwise merely distracts readers from
the issue at hand.
Above all else, Gardner's indicators flounder as a result of the many
shortcomings related to clinical judgment. These shortcomings were clearly
outlined in my original article, but Gardner preferred
to ignore them. Despite his preference to the contrary, however, Gardner
must contend with the many problems undermining the reliability and validity of
clinical judgment in defending his own indicators. Unfortunately,
Gardner's response to my original article neglects to address these problems,
and moreover, he manages to compound them.
For example, Gardner scolds me for disregarding his emphasis on "the quality
and quantity of the criteria satisfied." Nevertheless,
addressing considerations of the "quality and quantity" of Gardner's
indicators again resorts to clinical judgment. Gardner offers no
well-defined decision-making rules specifying exactly how an evaluator should
weigh the "quality and quantity" of his indicators. Without
well-defined decision-making rules for guiding the endeavor Gardner recommends,
evaluators must rely on their intuitive impressions. In other words,
Gardner's recommendations regarding the "quality and quantity" of his
indicators create more problems than they solve. In fact, an item by item
evaluation of Gardner's 24 indicators demonstrates how the vast majority of them
suffer the unreliable effects of clinical judgment.
1) History of Family Influences Conducive to the Development of
My major criticism of this particular indicator emphasized: "Without
well-specified decision rules for defining family violence, familial alcoholism,
psychopathy, and serious psychiatric disturbance, the ill-defined ambiguity of
these terms guarantees their inconsistent application in practice."
In response to this criticism, Gardner replied: "I suggest that the most
blatant manifestations of family dysfunction be utilized, e.g., 'history of
violence, alcoholism, drug abuse, psychopathy, serious psychiatric disturbance,
Unfortunately, Gardner's recommendations regarding "blatant
manifestations" merely beg the question. He provides no well-defined
decision-making rules for discriminating between mild, moderate, and
"blatant" manifestations of family of origin pathology.
Consequently, Gardner has again premised his argument upon the notorious
unreliability of clinical judgment; and as result, this particular indicator
cannot withstand well-informed cross-examination in a forensic setting.
2) Longstanding History of Emotional Deprivation.
In response to my criticisms of this indicator, Gardner replied: "Again,
I am in agreement that this criterion might be difficult to apply in certain
cases. I am in agreement, also, that it might be
misinterpreted." Therefore, I submit that Gardner himself has closed
the case for this indicator. Like the previous indicator, this one also
could not withstand well-informed cross-examination in a forensic setting.
3) Intellectual Impairment
In response to my original criticisms of this indicator, Gardner explained:
"I recognize that some scientific studies provide support for this
criterion and others do not. I have openly admitted that this is one of
the weaker criteria, which is certainly deserving of further study (as are all
of them)." Beyond my unqualified agreement with Gardner's own
assessment of the limitations related to this indicator, these very limitations
raise another difficult problem associated with practically all of his
Co-variation Matrix Applied to Gardner's Indicators.
Figure 1 represents a 2 x 2 co-variation matrix (Arkes, 1989). This
matrix illustrates the most serious problems sabotaging the reliability and
validity of Gardner's indicators. Cells A and D in Figure 1 represent
classificatory "hits," and cells B and C represent classificatory
"misses" or errors. In particular, cell B corresponds to
false-negative errors wherein one of Gardner's criteria indicates that
pedophilia is absent, but in fact it is present. Conversely, cell C
corresponds to false-positive errors wherein one of Gardner s criteria indicates
that pedophilia is present, but in fact it is absent.
Like so many other mental health professionals who ignore
the classificatory problems associated with base rates, Gardner seems to focus
excessively on the frequency with which cases fall into cells A and D, and he
overlooks the frequency with which cases fall into cells B and C.1
Without data available to determine the frequency with which Gardner's various
indicators lead to false-positive and false-negative errors, their use in a
forensic setting could result in an unacceptable number of classificatory
In particular, the relative infrequency with which pedophilia occurs
throughout the male population alarmingly increases the probability of Gardner's
indicators resulting in false-positive classifications. Consequently, one
can argue that Gardner's indicators have not yet developed beyond an
experimental stage (which he seems to suggest himself); and as result, his
indicators cannot satisfy the Frye test (Frye v. U.S.,
1923). The Frye test demands that expert testimony be premised on
evidence and principles that enjoy general acceptance in the relevant scientific
or professional community. In other words, the unavailability of
reliability and validity data to support Gardner's indicators precludes their
use in a forensic setting. Just as courts have excluded evidence related
to Summit's "Child Sexual Abuse Accommodation Syndrome" because of the
frequency with which it results in false-positive classifications (Ewing, 1992;
Myers et al., 1989), Gardner's indicators deserve the same fate for the same
The problems of false-positive errors are especially applicable to Gardner's
index of "Intellectual Impairment." Gardner does not specify
what he means by intellectual impairment does this criterion correspond
to a below average IQ of 99 or less, or does it correspond to a formal DSM-III-R
(American Psychiatric Association, 1987) diagnosis of mental retardation with an
IQ of 70 or below? Though defining "Intellectual Impairment in terms
of DSM-III-R criteria would reduce the frequency of false-positive errors, using
this criterion still creates the risk of an unacceptable frequency of mistaken
4) Childhood History of Sexual Abuse
It would be more appropriate to disregard Gardner's argumentative rhetoric
responding to my criticism of this indicator, and instead apply the co-variation
matrix table to it. Once again, Gardner reports no data allowing a court
to determine the frequency with which this criterion leads to false-positive and
false-negative classifications. In his response to my original article,
Gardner also neglects to address the problems related to defining a childhood
history of sexual abuse. For example, has a child who witnessed the
indecent exposure of an adult been sexually abused? The unavailability of
decision-making rules for borderline situations such as these make this
indicator inherently unreliable.
5) Longstanding History of Very Strong Sexual Urges
Despite Gardner's protests to the contrary, this particular indicator still
amounts to a "definitional nightmare." Gardner suggests that
specifying the age at which masturbation began allows one to define this
indicator more reliably. Pleased as Gardner seems with his suggestion, it
merely leads to more problems involving how to define masturbation? For
example, does the self-stimulatory rocking of most infants qualify as
masturbation? This indicator also overlooks the data reporting significant
variations in the frequency of sexual outlet as a result of social class
(Berelson & Steiner, 1964; Kinsey et al., 1948). Thus, what is
normative sexual behavior for one social class is not for another; and this
consideration further underscores the status of this criterion as a
"definitional nightmare." Though Gardner regards this indicator
as particularly significant, there is an alarming likelihood that it could
result in an unacceptable frequency of false-positive classifications.
In his response to my criticisms of this indicator, Gardner neglected to
address the single, most important problem "how do we reliably
define impulsiveness?" Without a reliable definition of
impulsiveness, Gardner can only rely on the massive shortcomings of clinical
judgment to identify what qualifies as impulsivity. I would argue that
resorting to clinical judgment to assess impulsiveness leads to an inordinate
number of false-positive and false-negative classifications when attempting to
7) Feelings of Inadequacy and Compensatory Narcissism
In response to my criticism of this indicator, Gardner replies: "I
recognize the difficulties in objectifying feelings of inadequacy. The
compensatory narcissism that derives from it is easier to assess."
Gardner's confidence in reliably assessing compensatory narcissism amounts to
another example of his gratuitous overconfidence. I would remind Gardner
that the diagnostic class of "Personality Disorders," which includes
"Narcissistic Personality Disorder," fails to satisfy the recommended
inter-rater reliability standards for DSM-III (American Psychiatric Association,
1980, p. 470).2
Though DSM-III provides decision-making criteria for diagnosing Personality
Disorders in general and Narcissistic Personality Disorder in particular
this diagnostic class does not qualify as reliable. Given the ill-defined
criteria that Gardner uses to assess "compensatory narcissism," it is
unlikely that the inter-rater reliability for this indicator would fare any
better than the DSM-III diagnosis of Personality Disorders. Therefore,
this is another instance of Gardner's faith in his indicators in this
instance the ease with which compensatory narcissism can be assessed
remaining unsupported by relevant data. Once again, then, we have another
indicator that cannot survive objective scrutiny.
8) Coercive-Dominating Behavior
In reacting to my original article, Gardner insists: "There is very
strong evidence in the scientific literature for this type of
pedophile." Whether his assessment of the literature related to this
indicator is accurate is not the issue. Instead, the issue involves the
now too-familiar question of how reliably can any evaluator assess
"Coercive-Dominating Behavior" using Gardner's criteria? As
pointed out in my original article, "This index involves multiple
categories of behavior (anti-social, aggressiveness, overt and covert
domination) which are so poorly defined that they defy reliable
classification." Gardner has failed to respond to this criticism; and
as a result, I submit that this particular indicator warrants repudiation as
9) Passivity and Impaired Self-Assertion
Gardner acknowledges the problems related to reliably assessing the traits
associated with this indicator. Therefore, he seems to agree that this
indicator also warrants repudiation by virtue of its inherent unreliability.
10) History of Substance Abuse
In responding to my criticisms of this indicator, Gardner begrudgingly
acknowledges, "Of course there are borderline situations." I
could not agree more, and Gardner's failure to develop decision-making rules for
these "borderline situations" can only reduce the reliability of this
indicator to an unacceptable level. Applying considerations of base rate
data to this indicator also leads to the conclusion that it would result in an
unacceptable frequency of false-positive classifications. Because the
incidence of substance abuse far exceeds the incidence or pedophilia, this
indicator will inevitably misclassify a large number of non-pedophiles as
11) Poor Judgment
In responding to my criticisms of this indicator, Gardner admits: "I
recognize that this is one of the more difficult criteria to objectively
assess." Nevertheless, he proceeds to admonish me for challenging the
validity of this criterion. Gardner needs to carefully review the
explanations of reliability and validity in my original article. Perhaps,
then, he will understand that while reliable criteria may be valid, unreliable
criteria by definition are always invalid (Anastasi, 1982).
Therefore, given the inherent unreliability of this indicator, it can never be
established as valid.
12) Impaired Sexual Interest in Age-Appropriate Women
In response to my criticisms of this indicator, Gardner protests my alleged
disregard of the references he cites to support it, and he lamely argues that,
"Every criterion will have its borderline subjects." Again,
however, Gardner overlooks the fundamental problem undermining almost all of his
criteria how do two or more evaluators reliably use this indicator and
the others? Until Gardner can satisfactorily deal with this question via
the development of well-defined decision-making rules, his indicators will
continue to qualify only as an experimental procedure. To belabor the
obvious, courts do not typically admit expert testimony premised upon
13) Presence of Other Sexual Deviations
As I pointed out in my original article, this indicator is more conducive to
reliable definition; but that most certainly does not guarantee its
validity. Establishing the validity of this particular indicator, and all
of Gardner's other criteria for that matter, necessitates the use of the
co-variation matrix presented in Figure 1. Without
the availability of data to indicate the frequency with which this criterion
would classify a sample of subjects into cells A, B, C, and D, its validity is
yet to be established.
My comments for Indicator #13 are equally applicable to this criterion.
15) Immaturity and/or Regression
In responding to my criticisms of this indicator, Gardner contends: "The
fact that it may be hard to objectively (or reliably) define immaturity in some
individuals, the fact that it may be difficult to provide objective criteria for
regression, does not preclude the validity of this criterion."
Unfortunately, Gardner's argument disregards more than 70 years of accumulated
data related to the relationship between the reliability and validity of
assessment procedures, and as result, his argument is ill-informed. Quite
simply, the validity of any assessment procedure can never exceed its
reliability (Anastasi, 1982; Cronbach, 1970). Therefore, the
unavailability of objective (or reliable) criteria with which to define
immaturity and/or regression most certainly does preclude the validity of this
16) Large Collection of Child Pornographic Materials
This is the indicator on which Gardner and I most likely share the greatest
agreement. Nevertheless, I would suggest that he could strengthen this
indicator by defining it in terms of the "Possession of any child
pornographic material." Additionally, "child pornographic
material" could be defined as any material that would subject an individual
to federal prosecution if he or she were to send it through the U.S. mail.
By virtue of how they have been re-defined, these criteria now qualify as
reliable. Consequently, Gardner would not have to engage in name-calling
(e.g., "zealot") directed at evaluators who use this indicator in ways
other than he intends. Such misuse of this indicator is precluded by the
redefined criteria related to it.
17) Career Choice That Brings Him in Contact with Children
My comments for Indicator #13 are also applicable to this particular
criterion. Additionally, this is another indicator that would result in an
unacceptable frequency of false-positive classifications despite Gardner's
affinity for it. Because the frequency of males who choose careers that
bring them into contact with children far exceeds the incidence of pedophilia,
this indicator misclassifies many well-adjusted males as pedophiles.
18) Recent Rejection by a Female Peer or Dysfunctional Heterosexual Relationship
My comments for Indicator #13 related to the covariation matrix are also
applicable to this indicator.
Additionally, I would also emphasize that this particular criterion would
result in an unacceptable frequency of cases falling into cell C, or being
classified as false-positives. The rationale supporting this assertion is
clearly outlined in my original article.
19) Unconvincing Denial
Because of the massive reliability problems related to this indicator, my
comments for Indicator #13 related to the covariation matrix are again
applicable to this particular criterion.
20) Use of Rationalizations and Cognitive Distortions That Justify
I suspect that Gardner failed to carefully read my comments in response to
this indicator; and therefore, I will repeat them. I emphasized,
"This index does not qualify as an 'indicator' of pedophilia; instead, it
conclusively confirms pedophilia when a suspect satisfies it."
Consequently, carefully reading my comments clearly reveals that I neither
trivialize nor offhandedly reject this criterion. I only emphasized that
it is much more than a mere indicator. As a result, it seems Gardner would
rather argue than accept the credit I gave him. Nevertheless, I would
still insist that Gardner deserves credit for specifying an important
characteristic of pedophiles via this particular criterion.
20) Resistance to Taking a Lie Detector Test
Gardner unfortunately overlooks the most serious problem created by this
indicator. He speaks of a population of "... pedophiles who refuse to
take the test for the reason they fear it will disclose their pedophilia,"
and I would agree with him that this population most certainly does exist.
Gardner should also consider the population of suspects falsely accused of child
sexual abuse who are disinclined to undergo polygraph examination because of
that device's unreliability. Then, when we consider the two populations
together, we must ask what decision-making rules does Gardner provide for
reliably discriminating between these two populations? To belabor the
obvious, Gardner offers no more than clinical judgment for discriminating
between these two populations; and as previously emphasized, clinical judgment
does not suffice for such discriminations.
22) lack of Cooperation in the Evaluative Examination
Gardner has offered no well-defined decision-making rules for determining
exactly what qualifies as "lack of cooperation in the evaluative
examination." Consequently, this indicator merely invites unreliable
speculation and conjecture.
23) Duplicity Unrelated to the Sex-Abuse Denial and Psychopathic
Gardner has failed to respond adequately to the major shortcoming of this
indicator how does an evaluator reliably identify "duplicity"
and "psychopathic tendencies?" Without well-defined decision
making-rules for these criteria, they also invite the unreliable outcomes of
clinical judgment. I am pleased to know that Gardner rejects the
"psychopathic deviant (sic)" scale of the MMPI for reliably
discriminating between pedophiles and non-pedophiles. I would remind him
that he "seemed" to conclude otherwise as a result of citing the work
of Haugaard and Repucci (1988) involving the Psychopathic Deviate scale of the
24) Excessively Moralistic Attitudes
Despite Gardner's suggestion to the contrary, I have no confidence whatsoever
in the validity of this indicator. Instead, I would insist that assessing
this criterion via the covariation matrix of Figure 1
would most likely result in an unacceptable frequency of false-positive
classifications (cell C in figure 1). Gardner attempts to dismiss my
position by claiming, "... he quickly raises his old argument of the
difficulties in objectifying this criterion, the problems of inter-rater
reliability and the dangers of one's own values interfering with assessing
Allow me to commend Gardner for accurately summarizing my position regarding
this indicator, and as a result, perhaps he will deal with these issues more
substantively in his subsequent response. I would also remind him that
however "old" my arguments related to this index are, their supposed
age does not invalidate them. Instead the familiarity of these criticisms
merely correspond to Gardner's facility for relying excessively on clinical
judgment again and again.
Homer and Guyer (1991) have carefully examined the classification problems
endemic to child sexual abuse litigation. In their cogent analysis of the
classification errors committed by self-styled "validators" of sexual
abuse, they emphasized:
Experts who cannot or will not convincingly specify the population with
which the targeted individual is being compared and who cannot provide clearly
reasoned and documented prevalence rates with which to calculate the
likelihood of classification errors, are highly likely to make errors of
classification. In our opinion, such experts should be precluded from
testifying in sexual abuse oases on grounds that their testimony is prejudicial
and not at all probative of the issue before the court (p. 401).
Though Gardner clearly specifies the population with which he compares
accused pedophiles, he most certainly cannot document the prevalence of
classification errors attributable to his indicators. Consequently, expert
testimony premised upon Gardner's indicators is too likely to be prejudicial.
Throughout his response to my original article, the sum and substance of
Gardner's reactions involve his shrill protests to the effect that
"But this is not how I intended to use the indicators."
Discrepancies between Gardner's thinking, and how others apply his indicators,
result in him censuring any evaluator who deviates from what he intended.
It would be more appropriate, however, for Gardner to acknowledge that his
criteria rely on such vague and ill-defined terms that they invite distortion
and misuse. Therefore, self-styled "validators" who twist
Gardner's indicators to serve their own biased agenda can do so because of the
indicators' inherent unreliability. Ultimately, then, the exceedingly
vague and ill-defined terms undermining Gardner's indicators encourage their
Reliable classification necessitates clearly defined criteria to reduce the ambiguity
that otherwise leads to conjecture and speculation. Unfortunately,
however, Gardner's response to my original article offers only his idiosyncratic
opinion as a guide for the use of his indicators. To belabor the obvious,
idiosyncratic opinion is never a satisfactory substitute for reliable
Gardner's willingness to establish his own idiosyncratic opinion as the
standard for reliably using his indicators essentially requires that other
evaluators attempt to read his mind. This expectation demands clairvoyance
which is as presumptuous as it is impossible to satisfy. Thus, I conclude
this reply as I concluded my original article: "Gardner's previously
acknowledged reputation as a courageous figure deserves continued respect, but
his 'Indicators of Pedophilia' do not."
Anastasi, A. (1982). Psychological Testing (5th ed.) ()().
New York: The Macmillan Company.
American Psychiatric Association
(1987). Diagnostic and Statistical Manual of Mental Disorders (3rd
Washington, DC: Author.
American Psychiatric Association
(1980). Diagnostic and Statistical Manual of Mental Disorders (3rd
edition). Washington, DC: Author.
Arkes, H. R. (1989). Principles in judgment/decision making research
pertinent to legal proceedings. Behavioral Sciences &
the Law, 7, 429-456.
Berelson, B., & Steiner, G. A. (1964). Human Behavior: An Inventory of
Scientific Findings ().
New York: Harcourt, Brace &
Campbell, T. W. (1992a). False allegations of sexual abuse and their apparent
credibility. American Journal of Forensic Psychology, 10(4), 21-35.
Campbell. T. W. (1992b). Allegations of sexual abuse II: Case example of a
criminal defense. American Journal of Forensic Psychology, 10(4), 37-48.
Campbell. T. W. (1992c). False allegations of sexual abuse and the
persuasiveness of play therapy. Issues in Child Abuse Accusations, 4(3),
Cronbach, L. J. (1970) Essentials of Psychological Testing ().
New York: Harper & Row.
Ewing, C. P. (1992 July). Judicial notebook: Child sexual abuse
"validation" on trial and retrial. APA Monitor,
Frye v. U.S., 293 Fed. 1013, 1014 (D.C. Cir. 1923).
Haugaard, J. J., & Reppucci, N. D. (1988). The Sexual Abuse of
San Francisco: Jossey-Bass.
Horner, T. M., & Guyer, M. J. (1991). Prediction, prevention, and
clinical expertise in child custody cases in which allegations of child sexual
abuse have been made. II. Prevalence rates of child sexual abuse and the
precision of "tests" constructed to diagnose it. Family Law
Quarterly, 25, 381-409.
Kinsey. A. C., Pomeroy, W. B., Martin. C. E., & Gebhard, P. (1948). Sexual
Behavior in the Human Male ().
Myers, J. E., Bays, J., Becker, J., Berliner, L., Corwin, D. L, &
Saywitz, K. J. (1989). Expert testimony in child sexual abuse litigation. Nebraska Law Review,
1 The concept of base
rate clarifies the enormous problems undermining any attempt to
assess or predict events that occur very infrequently. For
example, if 5% of the adult male population are pedophiles,
accurately identifying this population subset is exceedingly
difficult. An evaluator who capitalizes on this base rate
information can classify all males as non-pedophiles, and claim
an accuracy rate of 95%. Consequently, the merits of any
set of indicators depend on whether their use results in greater
classificatory accuracy than merely resorting to the relevant
base rate. [Back]
2 DSM-III specifies a
kappa coefficient of .70 or greater as corresponding to an
acceptable level of inter-rater reliability for its diagnostic
categories (p. 468). Phase one and phase two of the
DSM-III field trials reported kappa coefficients of .56 and .65
respectively for Personality Disorders as a diagnostic
Class. Consequently, one can legitimately argue that
Narcissistic Personality Disorder one instance of
Personality Disorder is an inherently unreliable
diagnosis. I should also clarify that it is necessary to
cite DSM-III in this regard because DSM-III-R does not report
any kappa coefficients corresponding to the inter-rater
reliabilities of its diagnostic categories. [Back]
* Terence W.
Campbell is a clinical and forensic psychologist at 36040
Dequindre, Sterling Heights, MI 48310. [Back]