Developing a reliable and systematic rating system to
study the psychotherapy process first began with Strupp’s
(1957) method of categorizing therapist utterances. Since
that time, dozens of rating systems have been constructed,
with varying degrees of success, reliability, acceptance
and use by other researchers. This article outlines the
basic constructs of some of these widely-known rating
systems and compares them to the Cognitive Elaboration
Rating System (CERS). The CERS was recently developed as
an attempt to reliably assess the occurrence and number of
positive cognitive elaborations verbalized by clients in
therapy sessions and the therapist interventions which led
to these elaborations. The development and construction
of the CERS is discussed, as well as future applicability
for researchers.

Development of the
Cognitive Elaboration Rating System (CERS)
John M. Grohol

The process of psychotherapy has been the focus of a wide
range of research for the past few decades, as researchers
and clinicians alike look for those variables which might
make the greatest impact on the outcome of therapy.
Psychotherapy has often been described as an “art” (e.g.,
Lindner, 1982; Storr, 1980) which is composed of many
intricate parts. These parts include personality
variables of both the client and the therapist, the
psychotherapeutic relationship, the therapeutic alliance
and rapport established between the client and therapist,
and the specific interventions the therapist brings to
therapy. Research on these factors has shown that there
may be important relationship factors that influence
positive therapy outcomes, that the client’s contribution
to the therapeutic alliance is also important, and that
various types of therapies work best for specific
disorders (Windholz & Silberschatz, 1988). This latter
point, however, is often disputed, because of research
which shows that improvement in therapy is more strongly
correlated with therapeutic relationship factors than with
specific therapist techniques (Garfield & Begin, 1986).

One method of examining the process of therapy is to
divide therapy sessions into discrete units of verbal
communication and examine the therapist-client
interactions that correlate most highly with positive
therapy outcomes. By devising a system of coding
therapist utterances, Strupp (1957) was one of the first
researchers to realize this method of examining the
therapeutic process. Strupp’s first system, however, was
limited; it measured only therapist communications. Since
that time, dozens of classification systems have been
developed for research, with varying degrees of success
(for reviews, see for example, Greensberg & Pinsof, 1988;
Elliott, Hill, Stiles, Friedlander, Mahrer, & Margison,
1987; Russell & Stiles, 1979).

It is beyond the scope of the present article to
summarize or evaluate all psychotherapy process rating
systems that are currently found in the literature.
Rather, the present author will examine the construction
and validation of some of the more stable and
comprehensive systems in contemporary process research.
The development of the Cognitive Elaboration Rating System
(CERS) will then be described, and pertinent aspects of
this system will be compared to the other psychotherapy
rating systems discussed. The future direction and
applicability of the CERS in psychotherapy process
research will also be examined.

Vanderbilt Psychotherapy Process Scale (VPPS)

The Vanderbilt Psychotherapy Process Scale (VPPS) was
originally devised in 1974 by Strupp, Hartley, and
Blackwood (Suh, O’Malley, Strupp, & Johnson, 1989). The
scale has undergone two revisions since then, by
Gomes-Schwartz in 1978 and in 1983 by O’Malley, Suh, and
Strupp. The VPPS seeks to be neutral in its theoretical
orientation as a measurement of positive and negative
predictive outcome variables in therapist-client
interactions. It consists of 80 5-point Likert-type items
– rated on an ordinal scale from 1 (“not at all”) to 5 (“a
great deal”) – which are divided into therapist and client
sections. Each section also contains two parts, one
dealing with characteristics of each person’s behavior and
the other part dealing with characteristics of each
person’s demeanor during the session (Suh et al., 1989).

Eight subscales are derived from these 80 items, the
first five of which are patient-oriented while the
remaining three are therapist-oriented (Table 1)(Suh et
al., 1989). Each scale contains between 6 and 13 items.
The scales were derived from the items on the basis of a
principal components factor analysis (O’Malley, Suh, &
Strupp, 1983). Patient participation describes the extent
to which the client is actively involved in the
therapeutic relationship. Patient hostility is used to
tap the more negative aspects of the client’s behaviors
and beliefs. Patient psychic distress seeks to measure
the client’s emotional state, especially feelings of
discouragement. Patient exploration describes the extent
to which the client is engaged in examination of his or
her feelings and experiences. Patient dependency is used
to measure the client’s dependency and reliance on the
therapist. Therapist exploration measures the therapist’s
attempts to examine the client’s behaviors, emotions, and
underlying motivations. Therapist warmth and friendliness
seeks to measure the therapist’s display of emotional
involvement while Negative therapist attitude describes
therapist attitudes and behaviors that may frighten,
threaten, or intimidate the client.

Table 1.
Vanderbilt Psychotherapy Process Scale Subscales

Patient Therapist
——————————— ————————————–
Patient participation – 8 items Therapist exploration – 13 items
Patient hostility – 6 items Therapist warmth – 9 items
Patient psychic distress – 9 items Negative therapist attitude – 6 items
Patient exploration – 7 items
Patient dependency – 6 items

The VPPS has been used in a number of different ways to
rate therapist-client interactions within therapy. Early
on, investigators defined the entire therapy hour as the
unit length and rated only the third session of
twenty-five clients (Gomes-Schwatz & Schwartz, 1978).
Gomes-Schwartz (1978) refined the sampling method by using
10-minute segments from thirty-five different client
sessions, randomly chosen from pre-defined representative
sessions (session 3, sessions one-half and three-quarters
of the way, and the next-to-last session). The current
version of the VPPS was developed using a systematic
sampling method (5 minutes from the beginning, middle, and
end of the hour) and 15-minute unit lengths taken from
thirty-eight clients (O’Malley, Suh, & Strupp, 1983).
Only the first three sessions of therapy were used for
each client. Suh et al. (1989) make their case for this
sampling method:

… Early sessions appear to be critically important for
the subsequent course of therapy. Furthermore, even if
ratings from later sessions demonstrate stronger
associations with outcome, they may not elucidate the
actual processes responsible for the development of
qualities manifested in the later sessions. (p. 136)

Multiple raters and media ranging from written
transcripts to audio and videotapes of therapy sessions
can be used in conjunction with the VPPS. Current
investigators working with the VPPS (Suh et al., 1989)
describe one rating procedure, the consensus team method,
in which raters working in pairs first independently rate
videotaped therapy sessions, then compare ratings with
each other, and finally reach a consensus on items in
disagreement through discussions and videotape review.
Suh et al. (1989) suggest that raters using the VPPS
should be at least graduate students with minimal clinical
experience; no other rater selection criteria are given.
Raters are first trained to criterion (r = .85 to .90) on
12 to 19 training segments and then begin working in
assigned pairs. Although various forms of media can be
used with the VPPS, researchers have found that ratings
based upon audio and videotapes are the most accurate and
caution that transcripts should not be used (Suh et al.,
1989). Interrater reliabilities have ranged from .60 to
.94, averaging .86 across three studies (cited in Suh et
al., 1989).

Suh et al. (1989) review the use of the VPPS in process
research and conclude that it has sufficient reliability
and validity as a psychotherapy research instrument (see
also Windholz & Silberschatz, 1988). Suh et al. (1989)
cite studies which have found that the client’s level of
interpersonal functioning prior to therapy is predictive
of the client’s participation in therapy, which is then
predictive of client outcome. Another discovery the
researchers mention is that changes in therapist attitudes
early on in therapy is important to client outcomes (cited
in Suh et al., 1989). These findings come from the
Vanderbilt Psychotherapy Research Project I; the second
project is currently underway and is investigating the
efficacy of time-limited dynamic psychotherapy. Six
raters are being used, working in consensus teams as
described above.

Moras and Hill (1991) recently examined the rater
selection criteria found in current process rating systems
and categorized these systems based upon the amount of
inference required. While Hill’s own system (the CVRMCS,
which will be discussed below) and Stiles’ system (VRM,
also discussed below) were classified as “moderate
inference instruments,” Moras and Hill describe the VPPS
as a “high inference instrument,” that is, a system that
requires individuals to rate stimuli that are complex and
require a large amount of inference on behalf of the
raters. Because of this factor and since the VPPS
categories measure intentions and internal states that
focus on the client (and is somewhat
psychodynamically-oriented, despite the researchers’
claims otherwise), this system is inadequate to measure
the cognitions and elaborations in which the present
researchers were interested. It also fails to accurately
measure, with just three categories, the wide range of
therapist interventions that especially interest the
current researchers.

Verbal Response Mode System (VRM)

Perhaps the most detailed, comprehensive, and complex of
rating taxonomies is Stiles’ (1992) Verbal Response Mode
system (VRM). VRM was developed by Stiles’ influence by
Jerry Goodman, Stiles’ clinical supervisor at UCLA in
1969. Goodman categorized six distinct response modes in
therapy: question, advisement, silence, interpretation,
reflection, and disclosure (Stiles, 1992). After further
exploration of response modes, Stiles and a colleague
proposed three underlying principles of classification:
source of experience, presumption about experience, and
frame of reference. By using these principles, Stiles
(1992) proposed that raters could categorize utterances
simply by answering three questions: Whose experience is
the topic?; does the utterance require the speaker to
presume knowledge of the other’s experience?; and whose
frame of reference is used? Two additional response
modes, edification (providing information) and
confirmation, were eventually added to Goodman’s original
categories. The “silence” category was renamed
“acknowledgment,” in an effort to better identify such
responses (Stiles, 1992).

By utilizing Stiles’ principles of classification,
utterances automatically fall into one of the eight
categories. Disclosure describes thoughts, feelings,
perceptions, or intentions. Edification states objective
information. Advisement attempts to guide behavior with
suggestions, commands, permission and prohibition.
Confirmation compares speaker’s experience with other’s
through agreement, disagreement, and by sharing
experiences or beliefs. Question describes a request for
information or guidance. Acknowledgment conveys receipt
of a communication (including salutations).
Interpretation explains or labels the other and can
describe judgments or evaluations of the other’s
experiences or behaviors. Reflection puts other’s
experiences into words through repetitions, restatements,
and clarifications (Stiles, 1992).

VRM seeks to transcend traditional category systems by
focusing on who is speaking (rather than “therapist” and
“client” categories) and the “other” person participating
in the discussion (Table 2). In this way, VRM recognizes
that clients can also make verbalizations usually ascribed
exclusively to therapists, such as reflections and
interpretations. VRM is a generalized coding system,
applicable to coding any conversation in almost any

Table 2.
Stiles (1992) Verbal Response Mode (VRM) System

Source Presumption Frame of Reference
of Experience About Experience Other Speaker
————- —————- ————— —————-
Other Other Reflection (R) Interpretation (I)
Speaker Acknowledgment (K) Question (Q)
Speaker Other Confirmation (C) Advisement (A)
Speaker Edification (E) Disclosure (D)

Another way in which VRM was developed to be used as a
generalized rating taxonomy is in its definition of
response units. Stiles’ system breaks down conversations
into their most basic and fundamental units, called
utterances. Since each utterance can be a simple
sentence, an independent clause, a nonrestrictive
dependent clause, an element of a compound predicate, or a
term of acknowledgment, evaluation, or address, there can
be dozens of codes given to one talking turn alone
(Stiles, 1992). Each utterance is coded twice – once for
form or literal meaning and once for intent or pragmatic
meaning – making Stiles’ system very detailed. For
instance, “Would you roll up your sleeve?” would be coded
as a question in form, but as an advisement in intent
(Stiles, 1992). Interrater reliabilities for form range
from .50 to .98, averaging .81, and for intent from .30 to
.96, averaging .68. Reliabilities for intent were usually
significantly lower than those for form across two studies
(cited in Stiles, 1992; Elliott, Hill, Stiles,
Friedlander, Mahrer, & Margison, 1987).

A negative aspect of coding this much detail for research
purposes is that transcripts of conversations to be rated
must be unitized first. Stiles (1992) recommends that
individuals who divide a transcript into units not be the
same persons who then code each unit. Although audio or
videotapes can be used for rating, Stiles cautions that
these modalities require more skill because of the level
of complexity they involve. Raters using the VRM system
should have a “high verbal aptitude, interest in
interpersonal communication, patience with details, and
intensive training and practice. Competence in basic
grammar is essential” (Stiles, 1992, p. 21).

The VRM system has been successfully used in a wide
number of research studies. One study found that the mode
intents of therapists vary dramatically with regard to
their theoretical orientation (cited in Stiles, 1992, the
finding also supported by Hill’s [1986] research). Other
research using the VRM taxonomy has dealt with topics
ranging from the differences in relationships and roles,
to medical interviews, relationship styles, state and
trait anxiety, awkward silences, etc. (cited in Stiles,
1992). Stiles (1992) claims that hundreds of raters have
coded thousands of utterances under the VRM system in
dozens of studies.

While the complexity of Stiles’ system mirrors the
difficulty of coding human speech, it was overly detailed
for the present study’s use. The categories appear to be
more useful than those found in the VPPS for detecting
elaborations of thoughts, emotions, and experiences, but
the VRM system is a painstakingly complex system that has
a lengthy learning curve and requires committed, long-term
raters. (Initially, it takes an average of 5 hours to
code a 1-hour therapy session; after 6 months, it still
takes over 2 hours to rate a 1-hour session [Stiles,
1992].) The required resources – such as time to
adequately train raters and code dozens of sessions, the
availability of long-term raters, the ability to provide
accurate unitized transcripts, etc. – were not available
to the present researchers.

Counselor Verbal Response Mode Category System (CVRMCS)

Hill (1978) attempted to develop a counselor response
category system that incorporated the components of
systems existing at the time. The result of this attempt
is the Counselor Verbal Response Mode Category System
(CVRMCS). Five stages of development were needed to
obtain the final categories used for ratings. Throughout
its development, the same two raters were used to help
identify important and reliable rating categories. During
the first stage, 25 categories taken from the existing
rating literature were used to rate two practice sessions.
Discussion resulted in revising some categories to reduce
overlap. During the second stage, 24 categories were used
to rate five practice sessions. During these first two
stages, interrater reliability remained low. Further
discussion and ratings in the third version of the system
resulted in interrater reliability on two practice
sessions at 80% and 90% agreement.

Face validity was then tested by asking three experienced
counseling psychologists to match examples of the various
categories with the appropriate definitions. Only half of
the examples were matched, leading to a reexamination and
clarification of the existing 24 categories. This fourth
version, now with only seventeen categories, was given to
another panel of three experienced counseling
psychologists, two of whom were able to obtain 80%
agreement on matching definitions with the appropriate
examples. The fifth and final revision used for Hill’s
(1978) initial study was just a reworded and clarified
version of the fourth version. This revision contained
the following categories: minimal encourager, approval-
reassurance, structuring, information, direct guidance,
closed question, open question, restatement, reflection,
nonverbal referent, interpretation, confrontation,
self-disclosure, silence, friendly discussion, criticism,
and unclassifiable (Hill, 1978).

The CVRMCS, like Stiles’ VRM system, uses complete and
accurate transcripts of therapy sessions as its primary
rating material. Transcripts are divided into what Hill
(1978) terms “response units (essentially grammatical
sentences),” (p. 463) which include brief phrases such as
“mmhmm” and “yes.” (Three years later, Hill better
defined this unit as any independent clause [cited in
Friedlander, 1982].) Raters independently listened to
therapy tapes, which consisted of 12 intake sessions (as
opposed to therapy sessions, which also can be used), and
followed along in a unitized transcript, rating each
response unit according to one of the 17 categories. Each
response unit could be placed into one or more of the
categories. Disagreements between raters were resolved
using a procedure similar to the the consensus team method
used in the VPPS, in which discussion was used to reach an
unanimous agreement for those items that were discrepant.

At the conclusion of Hill’s (1978) initial study, she
determined that there were 14 statistically significant,
mutually exclusive categories. Minimal encourager
describes an acknowledgment, simple agreement, or
Approval-reassurance provides emotional support, approval,
or reinforcement. Information describes information
usually taking the form of facts, data, or resources that
is supplied. Direct guidance consists of directions or
advice that the therapist gives to the client. Closed
question is a type of question that usually only requires
a one- or two-word answer, such as yes or no. Open
question is a type of question which requests a
clarification of feelings or an exploration of some
situation. Restatement describes a simple restating or
rephrasing of the client’s statement which often contains
similar but fewer words and is more concrete and clear
than the original statement. Reflection is a simple
restating or rephrasing of the client’s statement which
contains reference to stated or implied feelings.
Nonverbal referent points out body posture, voice tone or
level, facial expressions, etc. Interpretation may take
several forms, but always goes beyond what the client has
stated. For instance, it might establish connections
between seemingly unrelated events or statements; it
interprets defenses, feelings, resistance, or
transference; it might indicate themes, patterns, or
causal relationships in the client’s behavior.
Confrontation is defined by two parts: the first part may
be implied rather than stated and refers to some aspect of
the client’s message or behavior; the second part usually
begins with the word “but” and presents a discrepancy or
contradiction. Self-disclose describes a statement in
which the therapist shares his or her own personal
experiences or feelings with the client and usually begins
with the word “I.” Silence is a pause of five seconds or
more. Other describes statements that are unrelated to
the client’s problems, such as small talk and salutations
(Hill, 1978).

Friedlander (1982) refined Hill’s (1978) rating system by
examining some of the most prominent problems with the
CVRMCS. Two major problem areas were identified: the
mixture of classical and pragmatic coding categories (as
defined by Russell & Stiles, 1979) within the 14
categories used, and the inconsistency of the definition
of the response unit (Friedlander, 1982). Friedlander
(1982), using interrater discrepancies and face and
content validity tests similar to Hill’s (1978), combined
a number of redundant categories, resulting in nine
mutually exclusive categories (CVRMCS-R). Those
categories are: encouragement/approval/ reassurance,
reflection/restatement, self-disclosure, confrontation,
interpretation, providing information, information
seeking, direct guidance/advice, and unclassifiable. The
scoring unit was also redefined to include any dependent
or independent clause that at the minimum, contained a
verb phrase. Compound predicates also constituted
individual units. Ratings again were conducted from
unitized transcripts and disagreements between independent
raters were handled in the same manner as Hill (1978).

Rater qualifications were not initially specified in
either of the above systems. Two undergraduate students
majoring in psychology and a counseling psychologist
(Hill) were used as raters in Hill’s (1978) study;
interrater reliability was reported at .80. Two raters
were used for Friedlander’s (1982) updated system;
interrater reliability was reported as .85. Elliott et
al. (1987) found interrater reliabilites for Hill’s system
to range from .48 to .94, averaging .64 and for
Friedlander’s system from .32 to .82, averaging .57. Hill
(1986) later did note selection criteria and training for
raters; raters were selected on the basis of “a high
grade-point average, motivation, and ability to do the
task” (p. 140). Training required raters to become
familiar with the rating categories and then practice with
the system until at least two out of the three raters
agreed on 75-80% of all categories (Hill, 1986). As with
the other rating systems described here, a rater training
manual is available.

Hill (1986) also suggests weekly meetings amongst raters
to correct for rater drift, provide an opportunity for
affiliation to reduce boredom and loneliness, and
reconcile disagreements. Judgments of two out of three of
the raters are usually accepted without discussion; when
all three raters disagree on a category, discussion
ensues. To ensure that no one rater dominates or
influences the discussion process, Hill (1986) suggests
alternating which person talks first and allow equal time
and respect to each rater’s opinions and reasons for

Since 1987, only a handful of studies have utilized the
CVRMCS taxonomy in research. Cummings (1989) used the
system to discover that novice counselors used more
information-oriented responses when addressing a
help-seeking individual with intrapersonal problems (such
as procrastination, loneliness, etc.) and used more
reflection-oriented responses when an individual presented
with interpersonal problems (such as dealing with
conflicts or relationships with other people in that
person’s life). Other studies have dealt with changes in
graduate students after taking a course devoted to
developing counseling skills (Kivilghan, 1989) and an
overview of effective therapist techniques by Hill (1992).
There is a larger base of studies conducted before 1987
that use the CVRMCS to examine effectiveness of various
theoretical orientations in therapy (cited in Hill, 1986).
Unfortunately, most of these rated only the initial
intake session, with only a few examining response modes
across the course of treatment (Hill, 1986).

While not as complicated as Stile’s VRM system, nor as
content- or dynamically-oriented as the VPPS, neither the
CVRMCS nor Friedlander’s refinement of the CVRMCS rating
system were adequate for the present study, for two
important reasons. First, the CVRMCS is a
counselor-oriented rating system and does not include
ratings of client responses (Hill developed a similar, yet
separate system for rating client responses [Hill, 1986]).
Second, like the VRM system, the CVRMCS highly recommends
that verbatim transcripts of therapy session be used. The
present researchers had limited time and resources
available and could not provide such unitized transcripts
for this study.

Cognitive Elaboration Rating System (CERS)

Categorizing talking turns in therapy sessions is a
difficult undertaking. Past studies have illustrated
problems in attempting to define and rate therapist and
client interactions. For instance, interrater reliability
in such studies have averaged only in the high .50’s
(Elliott, et al., 1987). When examining the most reliable
rating categories, questions (.71) were followed by
advisement (.66), information (.64), and self-disclosure
(.61). These researchers concluded that there was no
“best” response-mode rating system, and that researchers
should use a rating system best suited to their own
particular needs (Elliott, et al., 1987).

The Cognitive Elaboration Rating System (CERS) was
developed as an extension of a literature review on the
development and current research of the elaboration
likelihood model (Petty & Cacioppo, 1986). The
elaboration likelihood model (ELM) states, in general,
that when clients expand or elaborate on positive
experiences, thoughts, or emotions on their own during a
therapy session, they are more likely to gain positive
outcomes from therapy. Successful outcomes in ELM can be
best measured by examining the therapeutic process.

The positive-outcome model of cognitive elaboration
theorizes a particular series of events in therapy.
First, the client’s negative or problem thoughts, ideas,
or cognitions are identified and examined in therapy. The
therapist then targets those faulty thoughts that are most
likely to be susceptible to change and begins a series of
interventions designed to promote new, healthier or more
adaptive cognitions. By presenting favorable, strong
arguments for these new cognitions, the therapist
encourages the client to begin elaborating on them,
through verbal and imagery strategies. Short-term change
is then likely to take place, at which time clients may
exhibit actual or superficial change, which must be
accurately assessed. Finally, by incorporating the new
constructive thoughts into their own cognitive schema, the
client maintains changes and begins to generalize the
changes to other ideas and behaviors (Barone & Hutchings,

The current researchers were interested in examining the
client’s cognitive elaborations as a result of the
therapist’s arguments for change. It would be informative
to discover, for example, whether certain types of
therapist responses would be more likely to elicit
positive client elaborations. By incorporating a number
of pre-treatment and post-treatment outcome measures, it
could then be determined whether greater cognitive
elaboration in therapy sessions contributes to more
positive therapy outcomes and, more specifically, the type
of therapist interventions that result in greater amounts
of positive client cognitive elaborations.

Numerous studies have been conducted that support the
above general hypothesis, but none so far have been
completed on a clinical population (Barone & Hutchings,
1993). Such studies are likely to be large undertakings
that require analyzing dozens of individual therapy
sessions of different clients and therapists in a clinical
setting. A rating system would need to be developed to
address the specific research issues unique to studying
cognitive elaborations within therapy sessions.
Initial Development of CERS

The Cognitive Elaboration Rating System (CERS) is the
direct product of those needs. CERS was developed over
approximately a one-year period of time. The researchers
first modeled a response mode rating system based upon the
findings of Elliott, et al. (1987) and the basic
theoretical underpinnings of ELM (Petty & Cacioppo, 1986).
This experimental rating system eventually contained
fifteen categories – nine for coding therapist
communications and six for rating client communications
(Table 3). Three of the six client categories were also
rated according to whether they were positively,
neutrally, or negatively goal-directed, according to each
client’s individual treatment plan. It was hypothesized
that by noting whether certain interventions in therapy
were followed by either positive or negative cognitive
elaborations, the impact of such interventions could be an
important factor for measuring treatment efficacy. Hence,
for example, if a therapist who asked an abundance of
open-ended questions continually received positive
self-disclosures in response to his or her questions from
different clients across the spectrum, this would be an
important variable to note.

Table 3.
Original experimental categories for CERS

Therapist Client
—————– ———————–
Confrontation Compliance (+, -, N)
Information Resistance (+, -, N)
Advisement Self-disclosure (+, -, N)
Interpretation Reflection
Self-disclosure Question
Reflection Other (Inaudible)
Other (Inaudible)

The units of measurement in this initial study were
complete talking turns between therapist and client. Each
talking turn was rated according to the speaker’s
categories, while minimal encouragements (such as, “Mmhm”)
were ignored. Talking turns could receive more than one
rating, which occurred most often during long stretches of

Ratings were drawn from audiotaped therapy sessions of
clients being seen in a local, suburban community mental
health center. Clients carried an Axis I or Axis II
mental disorder diagnosis from the Diagnostic and
Statistical Manual of Mental Disorders – 3rd Edition,
Revised (APA, 1987) and were generally of lower
socio-economic status. Therapists were doctoral-level
clinical psychology students operating in a general adult
outpatient psychotherapy program and were supervised by a
licensed clinical Ph.D. psychologist. Therapists
represented the entire continuum of therapeutic
orientations and interventions currently practiced.
Initially, ratings were completed for the entire 45-minute
audiotape. While the system was under development, raters
– two doctoral-level students in a clinical psychology
program and two clinical Ph.D. psychologists –
independently rated each session according to the initial
CERS categories. Treatment plans describing specific
treatment goals for each client were provided and reviewed
at the onset of each rating session. After approximately
20 to 25 talking turns had been completed, the audiotape
was stopped and ratings among the four individuals were
compared. Discussion resulting in a consensus among the
raters helped clarify discrepancies between rating
categories as the study progressed.

After approximately six months of developing this
initial cognitive elaboration rating system, it was
completely discarded for a number of reasons. First, it
was discovered to be inefficient for examining therapy
sessions for cognitive elaborations and the context in
which they occur. That is, more ratings were performed
than were necessary for the detection of cognitive
elaboration and the therapist’s prompting of client
elaboration. Second, adequate interrater reliability
among the researchers using this system was never
obtained, and varied widely depending upon the complexity
of the session being rated (r = .20 to .78, averaging
.53). Discussions regarding discrepancies among raters
occurred very frequently, even after the rating categories
had been defined and used for months by the same set of
raters. Third, the rating system required that raters
remember the client’s treatment goals for therapy
throughout the entire rating period. These goals were
developed by the therapist and client without regard to
research concerns. Consequently, ratings were often open
to interpretation as to whether a client’s responses were
positively, negatively, or neutrally goal-oriented.

Final Version of CERS

The response mode rating system which rated verbal
talking turns in therapy was exchanged for a rating system
which coded the content of therapists’ and clients’
responses during 3-minute time intervals. Although a
time-interval mode was judged to be less sensitive than a
response mode rating system, it was believed that it would
be accurate and sensitive enough to determine whether
cognitive elaboration took place and during which types of
client-therapist interactions.

At the same time, new categories were developed to
simplify rating tasks and increase interrater agreement.
These categories were: old or existing experiences,
emotions, or beliefs/ideas; and, new or developing
experiences, emotions, or beliefs/ideas (Table 4). A
rating worksheet was used and during each 3-minute
interval, raters were instructed to simply check off the
occurrence of one of the above categories. Face validity
for these categories was never obtained.

Table 4.
Final categories for CERS

Therapist Client
—————- —————–
Old/existing: Old/existing:
Experiences Experiences
Emotions Emotions
Beliefs/ideas Beliefs/ideas
New/developing: New/developing:
Experiences Experiences
Emotions Emotions
Beliefs/ideas Beliefs/ideas

At the end of each 15-minute interval (or five 3-minute
intervals, as the worksheet was designed [Appendix B]),
raters halted the audiotape and also rated the following
five categories on a 7-point Likert-type scale: client’s
agreement with therapist, client’s amount of elaboration,
polarity of client’s elaboration, therapist’s persuasive
attempts, and therapist’s prompting of elaboration.
Client’s agreement with therapist refers to the amount of
agreement or disagreement the client had with the
therapist during this segment. Client’s amount of
elaboration refers to the amount of elaborations the
client made compared to the overall percentage of time the
client spoke and compared with all other clients.
Polarity of client’s elaboration refers to whether the
client’s elaborative attempts were mostly negative,
neutral, or positive in the 15-minute interval.
Therapist’s persuasive attempts refers to how much the
therapist attempted to persuade the client as compared to
the overall percentage of time the therapist spoke and
compared with all other therapists. Therapist’s prompting
of elaboration refers to the amount of elaboration the
therapist prompted in the client as compared to the
overall percentage of time the therapist spoke and
compared to all other therapists.

While the three previous raters (one of the Ph.D.
psychologists discontinued rating materials) independently
rated the same audiotapes used in the earlier rating
system, one of the raters would keep track of the time.
When three minutes had passed, the audiotape was stopped
and ratings for that 3-minute segment were compared.
Discussion ensued over disagreements among the ratings and
final ratings were based on a consensus among the raters.

During these discussions, it was discovered that a number
of rating categories were difficult to distinguish and
were sometimes misunderstood. It was at this point that a
rater’s training manual was begun, to record rules that
were developed for these difficult decisions (Appendix B).
Rules were developed from criteria discussed amongst the
raters and based upon pragmatic concerns. For instance,
it was decided that small talk that often occurred at the
beginning of many therapy sessions could be ignored, since
it involved client-therapist interchanges that were of
little therapeutic value. The manual also provides an
overview of the rating system, explains each of the rating
categories (including the five Likert-type scales), and
describes in detail each of the eight rules developed to
distinguish especially difficult rating situations.
Raters were encouraged to familiarize themselves with this
manual and review it prior to each rating session.

When it became evident that this system had much greater
interrater reliability (averaging 83% agreement), the
initial interrater reliability study was initiated. This
study followed essentially the same format as described
above, with a few minor changes. Only two raters were
used, one female doctoral-level student previously
unrelated to the study and one male doctoral-level student
who had been one of the raters throughout the development
of the CERS. An audiotape with a tone sounding every
three minutes was made to free the raters from the
additional task of time-keeping. It was determined
pragmatically that the best sessions to rate for each
client would be the third session, some mid-point session,
and the last session of therapy. This interrater
reliability study was conducted and the details of this
study, as well as its results, will be presented in an
independent paper.

Raters were selected independent of their theoretical
orientations or personality characteristics. Due to
practical limitations, raters selected for the initial and
final versions of the CERS were always doctoral-level
students and the researchers themselves. No effort was
made to prescreen the desirability of raters based upon a
theoretical understanding of ELM, cognitive elaboration,
or therapy experience; all of the current raters, however,
did have at least minimal therapy experience. Moras &
Hill (1991) suggest that raters be selected based upon
preset criteria established by individual researchers and
that such criteria be noted. The criteria used for the
present study included the attainment of a preset
interrater reliability statistic among raters (r = .85 or
better) and a motivation to complete the task.

The CERS is unique in its unit length. Whereas most
other systems define the unit as either an utterance or a
response unit, the CERS uses a time-interval unit of every
3 minutes within three complete therapy sessions. The
VPPS has been used in studies with varying unit lengths,
from 10 to 15 minutes, to the entire therapy hour.
Researchers currently recommend 15-minute unit lengths
systematically sampled throughout a therapy session (Suh
et al., 1989). Also like the VPPS, the CERS utilizes
Likert-type scales for some of its ratings, and uses
15-minute unit segments.

The development of the CERS somewhat parallels the
construction of the other systems presented here. CERS
was developed and refined over a period of time in which
categories were fully explored and defined, while ratings
were discussed at great length amongst the system’s
authors. This method was similar to Hill’s (1978)
development of the CVRMCS, which also underwent different
stages of development and refinement (culminating in
Friedlander’s 1982 revision of the original system).
Other systems began from more stable roots and have
changed only slightly since their introduction, including
the VPPS, which is only in its second full version, and
the VRM system, which remains largely unchanged since 1976
(Stiles, 1992).

Once the reliability and validity of the CERS has been
established, researchers can begin examining the purpose
for which CERS was constructed. That is, namely, to
examine the existence of cognitive elaborations in therapy
and under which types of therapist interventions positive
and negative client elaborations occur. With a system
sensitive enough to distinguish such interventions in
therapy, yet pragmatic enough to train raters on it
quickly and reliably, a large research study can be
undertaken with a clinical population. It must be
emphasized that the final version of CERS has not yet been
determined to be sensitive enough to distinguish the
therapist interventions which elicit greater client
cognitive elaborations. The therapeutic process is a
dynamic process that includes client personality
variables, nonverbal behaviors, often dissimilar
intentions between the therapist and client, and
situational variables, all of which must be taken into
account when examining therapeutic outcomes. CERS is just
one small part of understanding the entire therapeutic
process, but may prove extremely useful in this


American Psychiatric Association. (1987). Diagnostic and
statistical manual of mental disorders (3rd ed., rev.).
Washington, DC: Author.

Barone, D.F., & Hutchings, P.S. (1993). Cognitive elaboration:
Basic research and clinical application. Clinical Psychology
Review, 13, 187-201.

Cummings, A.L. (1989). Relationship of client problem type to
novice counselor response modes. Journal of Counseling
Psychology, 36, 331-335.

Elliott, R., Hill, C.E., Stiles, W.B., Friedlander, M.L.,
Mahrer, A.R., & Margison, F.R. (1987). Primary therapist
response modes: Comparison of six rating systems. Journal of
Consulting and Clinical Psychology, 55, 218-223.

Friedlander, M.L. (1982). Counseling discourse as a speech event:
Revision and extension of the Hill Counselor Verbal Response
Category System. Journal of Counseling Psychology, 29,425-429.

Garfield, S.L., & Bergin, A.E. (Eds.). (1986). Handbook of
psychotherapy and behavior change. New York: Wiley & Sons.

Gomes-Schwartz, B. (1978). Effective ingredients in psychotherapy:
Prediction of outcome from process variables. Journal of
Consulting and Clinical Psychology, 46, 1023-1035.

Gomes-Schwartz, B. & Schwartz, J.M. (1978). Psychotherapy process
variables distinguishing the “inherently helpful” person from
the professional psychotherapist. Journal of Consulting and
Clinical Psychology, 46, 196-197.

Greensberg, L.S., & Pinsof, W.M. (Eds.). (1986). The
psychotherapeutic process: A research handbook. New York:
Guilford Press.

Hill, C.E. (1978). Development of a Counselor Verbal Response
Category System. Journal of Counseling Psychology, 25, 461-468.

Hill, C.E. (1986). An overview of the Hill Counselor and Client
Verbal Response Modes Category Systems. In L.S. Greenberg &
W.M. Pinsof (Eds.), The psychotherapeutic process: A research
handbook (pp. 131-160). New York: Guilford.

Hill, C.E. (1992). Research on therapist techniques in brief
individual therapy: Implications for practioners. Counseling
Psychologist, 20, 689-711.

Kivlighan, D.M. (1989). Changes in counselor intentions and
response modes and in client reactions and session evaluation
after training. Journal of Counseling Psychology, 36, 471-476.

Lindner, R. (1954). The fifty-minute hour. New York: Delta.

Moras, K., & Hill, C.E. (1991). Rater selection for psychotherapy
process research: An evaluation of the state of the art.
Psychotherapy Research, 1, 113-123.

O’Malley, S.S., Suh, C.S., & Strupp, H.H. (1983). The Vanderbilt
Psychotherapy Process Scale: A report on the scale development
and a process-outcome study. Journal of Consulting and Clinical
Psychology, 51, 581-586.

Petty, R.E., & Cacioppo, J.T. (1986). Communication and persuasion:
Central and peripheral routes to attitude change. New York:

Russell, R., & Stiles, W. (1979). Categories for classifying
language in psychotherapy. Psychological Bulletin, 86, 406-419.

Stiles, W.B. (1978). Manual for a taxonomy of verbal response modes.
Chapel Hill: Institute for Research in Social Science,
University of North Carolina at Chapel Hill.

Stiles, W.B. (1992). Describing talk: A taxonomy of verbal
response modes. London: Sage Publications.

Storr, A. (1990). The art of psychotherapy (2nd ed.). New York: Routledge.

Strupp, H.H. (1957). A multidimensional system for analyzing
psychotherapeutic techniques. Psychiatry, 20, 293-306.

Suh, C.S., O’Malley, S.S., Strupp, H.H., & Johnson, M.E. (1989).
The Vanderbilt Psychotherapy Process Scale (VPPS). Journal of
Cognitive Psychotherapy: An International Quarterly, 3, 123-154.

Windholz, M.J., & Silberschatz, G. (1988). The Vanderbilt
Psychotherapy Process Scale: A replication with adult
outpatients. Journal of Consulting and Clinical Psychology, 56, 56-60.