John M. Grohol



	Developing a reliable and systematic rating system to 
study the psychotherapy process first began with Strupp's 
(1957) method of categorizing therapist utterances.  Since 
that time, dozens of rating systems have been constructed, 
with varying degrees of success, reliability, acceptance 
and use by other researchers.  This article outlines the 
basic constructs of some of these widely-known rating 
systems and compares them to the Cognitive Elaboration 
Rating System (CERS).  The CERS was recently developed as 
an attempt to reliably assess the occurrence and number of 
positive cognitive elaborations verbalized by clients in 
therapy sessions and the therapist interventions which led 
to these elaborations.  The development and construction 
of the CERS is discussed, as well as future applicability 
for researchers.

                     Development of the 
            Cognitive Elaboration Rating System (CERS)
                       John M. Grohol

	The process of psychotherapy has been the focus of a wide 
range of research for the past few decades, as researchers 
and clinicians alike look for those variables which might 
make the greatest impact on the outcome of therapy.  
Psychotherapy has often been described as an "art" (e.g., 
Lindner, 1982; Storr, 1980) which is composed of many 
intricate parts.  These parts include personality 
variables of both the client and the therapist, the 
psychotherapeutic relationship, the therapeutic alliance 
and rapport established between the client and therapist, 
and the specific interventions the therapist brings to 
therapy.  Research on these factors has shown that there 
may be important relationship factors that influence 
positive therapy outcomes, that the client's contribution 
to the therapeutic alliance is also important, and that 
various types of therapies work best for specific 
disorders (Windholz & Silberschatz, 1988).  This latter 
point, however, is often disputed, because of research 
which shows that improvement in therapy is more strongly 
correlated with therapeutic relationship factors than with 
specific therapist techniques (Garfield & Begin, 1986).

	One method of examining the process of therapy is to 
divide therapy sessions into discrete units of verbal 
communication and examine the therapist-client 
interactions that correlate most highly with positive 
therapy outcomes.  By devising a system of coding 
therapist utterances, Strupp (1957) was one of the first 
researchers to realize this method of examining the 
therapeutic process.  Strupp's first system, however, was 
limited; it measured only therapist communications.  Since 
that time, dozens of classification systems have been 
developed for research, with varying degrees of success 
(for reviews, see for example, Greensberg & Pinsof, 1988; 
Elliott, Hill, Stiles, Friedlander, Mahrer, & Margison, 
1987; Russell & Stiles, 1979). 

	It is beyond the scope of the present article to 
summarize or evaluate all psychotherapy process rating 
systems that are currently found in the literature.  
Rather, the present author will examine the construction 
and validation of some of the more stable and 
comprehensive systems in contemporary process research.  
The development of the Cognitive Elaboration Rating System 
(CERS) will then be described, and pertinent aspects of 
this system will be compared to the other psychotherapy 
rating systems discussed.  The future direction and 
applicability of the CERS in psychotherapy process 
research will also be examined.   	

Vanderbilt Psychotherapy Process Scale (VPPS)

	The Vanderbilt Psychotherapy Process Scale (VPPS) was 
originally devised in 1974 by Strupp, Hartley, and 
Blackwood (Suh, O'Malley, Strupp, & Johnson, 1989).  The 
scale has undergone two revisions since then, by 
Gomes-Schwartz in 1978 and in 1983 by O'Malley, Suh, and 
Strupp.  The VPPS seeks to be neutral in its theoretical 
orientation as a measurement of positive and negative 
predictive outcome variables in therapist-client 
interactions.  It consists of 80 5-point Likert-type items 
- rated on an ordinal scale from 1 ("not at all") to 5 ("a 
great deal") - which are divided into therapist and client 
sections.  Each section also contains two parts, one 
dealing with characteristics of each person's behavior and 
the other part dealing with characteristics of each 
person's demeanor during the session (Suh et al., 1989).

	Eight subscales are derived from these 80 items, the 
first five of which are patient-oriented while the 
remaining three are therapist-oriented (Table 1)(Suh et 
al., 1989).  Each scale contains between 6 and 13 items.  
The scales were derived from the items on the basis of a 
principal components factor analysis (O'Malley, Suh, & 
Strupp, 1983).  Patient participation describes the extent 
to which the client is actively involved in the 
therapeutic relationship.  Patient hostility is used to 
tap the more negative aspects of the client's behaviors 
and beliefs.  Patient psychic distress seeks to measure 
the client's emotional state, especially feelings of 
discouragement.  Patient exploration describes the extent 
to which the client is engaged in examination of his or 
her feelings and experiences.  Patient dependency is used 
to measure the client's dependency and reliance on the 
therapist.  Therapist exploration measures the therapist's 
attempts to examine the client's behaviors, emotions, and 
underlying motivations.  Therapist warmth and friendliness 
seeks to measure the therapist's display of emotional 
involvement while Negative therapist attitude describes 
therapist attitudes and behaviors that may frighten, 
threaten, or intimidate the client.

Table 1.
Vanderbilt Psychotherapy Process Scale Subscales
Patient                             Therapist
---------------------------------   -------------------------------------- 
Patient participation - 8 items     Therapist exploration - 13 items 
Patient hostility - 6 items         Therapist warmth - 9 items 
Patient psychic distress - 9 items  Negative therapist attitude - 6 items 
Patient exploration - 7 items 	
Patient dependency - 6 items 	

	The VPPS has been used  in a number of different ways to 
rate therapist-client interactions within therapy.  Early 
on, investigators defined the entire therapy hour as the 
unit length and rated only the third session of 
twenty-five clients (Gomes-Schwatz & Schwartz, 1978).  
Gomes-Schwartz (1978) refined the sampling method by using 
10-minute segments from thirty-five different client 
sessions, randomly chosen from pre-defined representative 
sessions (session 3, sessions one-half and three-quarters 
of the way, and the next-to-last session).  The current 
version of the VPPS was developed using a systematic 
sampling method (5 minutes from the beginning, middle, and 
end of the hour) and 15-minute unit lengths taken from 
thirty-eight clients (O'Malley, Suh, & Strupp, 1983).  
Only the first three sessions of therapy were used for 
each client.  Suh et al. (1989) make their case for this 
sampling method:

... Early sessions appear to be critically important for 
the subsequent  course of therapy.  Furthermore, even if 
ratings from later sessions demonstrate stronger 
associations with outcome, they may not elucidate the 
actual processes responsible for the development of 
qualities manifested in the later sessions. (p. 136)

	Multiple raters and media ranging from written 
transcripts to audio and videotapes of therapy sessions 
can be used in conjunction with the VPPS.  Current 
investigators working with the VPPS (Suh et al., 1989) 
describe one rating procedure, the consensus team method, 
in which raters working in pairs first independently rate 
videotaped therapy sessions, then compare ratings with 
each other, and finally reach a consensus on items in 
disagreement through discussions and videotape review.  
Suh et al. (1989) suggest that raters using the VPPS 
should be at least graduate students with minimal clinical 
experience; no other rater selection criteria are given.  
Raters are first trained to criterion (r = .85 to .90) on 
12 to 19 training segments and then begin working in 
assigned pairs.  Although various forms of media can be 
used with the VPPS, researchers have found that ratings 
based upon audio and videotapes are the most accurate and 
caution that transcripts should not be used (Suh et al., 
1989).  Interrater reliabilities have ranged from .60 to 
.94, averaging .86 across three studies (cited in Suh et 
al., 1989).

	Suh et al. (1989) review the use of the VPPS in process 
research and conclude that it has sufficient reliability 
and validity as a psychotherapy research instrument (see 
also Windholz & Silberschatz, 1988).  Suh et al. (1989) 
cite studies which have found that the client's level of 
interpersonal functioning prior to therapy is predictive 
of the client's participation in therapy, which is then 
predictive of client outcome.  Another discovery the 
researchers mention is that changes in therapist attitudes 
early on in therapy is important to client outcomes (cited 
in Suh et al., 1989).  These findings come from the 
Vanderbilt Psychotherapy Research Project I; the second 
project is currently underway and is investigating the 
efficacy of time-limited dynamic psychotherapy.  Six 
raters are being used, working in consensus teams as 
described above.

	Moras and Hill (1991) recently examined the rater 
selection criteria found in current process rating systems 
and categorized these systems based upon the amount of 
inference required.  While Hill's own system (the CVRMCS, 
which will be discussed below) and Stiles' system (VRM, 
also discussed below) were classified as "moderate 
inference instruments," Moras and Hill describe the VPPS 
as a "high inference instrument," that is, a system that 
requires individuals to rate stimuli that are complex and 
require a large amount of inference on behalf of the 
raters.  Because of this factor and since the VPPS 
categories measure intentions and internal states that 
focus on the client (and is somewhat 
psychodynamically-oriented, despite the researchers' 
claims otherwise), this system is inadequate to measure 
the cognitions and elaborations in which the present 
researchers were interested.  It also fails to accurately 
measure, with just three categories, the wide range of 
therapist interventions that especially interest the 
current researchers.

Verbal Response Mode System (VRM)

	Perhaps the most detailed, comprehensive, and complex of 
rating taxonomies is Stiles' (1992) Verbal Response Mode 
system (VRM).  VRM was developed by Stiles' influence by 
Jerry Goodman, Stiles' clinical supervisor at UCLA in 
1969.  Goodman categorized six distinct response modes in 
therapy: question, advisement, silence, interpretation, 
reflection, and disclosure (Stiles, 1992).  After further 
exploration of response modes, Stiles and a colleague 
proposed three underlying principles of classification: 
source of experience, presumption about experience, and 
frame of reference.  By using these principles, Stiles 
(1992) proposed that raters could categorize utterances 
simply by answering three questions: Whose experience is 
the topic?; does the utterance require the speaker to 
presume knowledge of the other's experience?; and whose 
frame of reference is used?  Two additional response 
modes, edification (providing information) and 
confirmation, were eventually added to Goodman's original 
categories.  The "silence" category was renamed 
"acknowledgment," in an effort to better identify such 
responses (Stiles, 1992).  

	By utilizing Stiles' principles of classification, 
utterances automatically fall into one of the eight 
categories.  Disclosure describes thoughts, feelings, 
perceptions, or intentions.  Edification states objective 
information.  Advisement attempts to guide behavior with 
suggestions, commands, permission and prohibition.  
Confirmation compares speaker's experience with other's 
through agreement, disagreement, and by sharing 
experiences or beliefs.  Question describes a request for 
information or guidance.  Acknowledgment conveys receipt 
of a communication (including salutations).  
Interpretation explains or labels the other and can 
describe judgments or evaluations of the other's 
experiences or behaviors.  Reflection puts other's 
experiences into words through repetitions, restatements, 
and clarifications (Stiles, 1992).

	VRM seeks to transcend traditional category systems by 
focusing on who is speaking (rather than "therapist" and 
"client" categories) and the "other" person participating 
in the discussion (Table 2).  In this way, VRM recognizes 
that clients can also make verbalizations usually ascribed 
exclusively to therapists, such as reflections and 
interpretations.  VRM is a generalized coding system, 
applicable to coding any conversation in almost any 

Table 2.  
Stiles (1992) Verbal Response Mode (VRM) System

   Source          Presumption               Frame of Reference 	
of Experience 	 About Experience   	Other 	             Speaker 
-------------    ----------------   ---------------      ----------------
    Other            Other          Reflection (R)       Interpretation (I) 
                     Speaker        Acknowledgment (K)   Question (Q) 
    Speaker          Other          Confirmation (C)     Advisement (A) 
                     Speaker        Edification (E)      Disclosure (D) 

	Another way in which VRM was developed to be used as a 
generalized rating taxonomy is in its definition of 
response units.  Stiles' system breaks down conversations 
into their most basic and fundamental units, called 
utterances.  Since each utterance can be a simple 
sentence, an independent clause, a nonrestrictive 
dependent clause, an element of a compound predicate, or a 
term of acknowledgment, evaluation, or address, there can 
be dozens of codes given to one talking turn alone 
(Stiles, 1992).  Each utterance is coded twice - once for 
form or literal meaning and once for intent or pragmatic 
meaning - making Stiles' system very detailed.  For 
instance, "Would you roll up your sleeve?" would be coded 
as a question in form, but as an advisement in intent 
(Stiles, 1992).  Interrater reliabilities for form range 
from .50 to .98, averaging .81, and for intent from .30 to 
.96, averaging .68.  Reliabilities for intent were usually 
significantly lower than those for form across two studies 
(cited in Stiles, 1992; Elliott, Hill, Stiles, 
Friedlander, Mahrer, & Margison, 1987).

	A negative aspect of coding this much detail for research 
purposes is that transcripts of conversations to be rated 
must be unitized first.  Stiles (1992) recommends that 
individuals who divide a transcript into units not be the 
same persons who then code each unit.  Although audio or 
videotapes can be used for rating, Stiles cautions that 
these modalities require more skill because of the level 
of complexity they involve.  Raters using the VRM system 
should have a "high verbal aptitude, interest in 
interpersonal communication, patience with details, and 
intensive training and practice. Competence in basic 
grammar is essential" (Stiles, 1992, p. 21).

	The VRM system has been successfully used in a wide 
number of research studies.  One study found that the mode 
intents of therapists vary dramatically with regard to 
their theoretical orientation (cited in Stiles, 1992, the 
finding also supported by Hill's [1986] research).  Other 
research using the VRM taxonomy has dealt with topics 
ranging from the differences in relationships and roles, 
to medical interviews, relationship styles, state and 
trait anxiety, awkward silences, etc. (cited in Stiles, 
1992).  Stiles (1992) claims that hundreds of raters have 
coded thousands of utterances under the VRM system in 
dozens of studies.  

	While the complexity of Stiles' system mirrors the 
difficulty of coding human speech, it was overly detailed 
for the present study's use.  The categories appear to be 
more useful than those found in the VPPS for detecting 
elaborations of thoughts, emotions, and experiences, but 
the VRM system is a painstakingly complex system that has 
a lengthy learning curve and requires committed, long-term 
raters.  (Initially, it takes an average of 5 hours to 
code a 1-hour therapy session; after 6 months, it still 
takes over 2 hours to rate a 1-hour session [Stiles, 
1992].)  The required resources - such as time to 
adequately train raters and code dozens of sessions, the 
availability of long-term raters, the ability to provide 
accurate unitized transcripts, etc. - were not available 
to the present researchers.  

Counselor Verbal Response Mode Category System (CVRMCS)

	Hill (1978) attempted to develop a counselor response 
category system that incorporated the components of 
systems existing at the time.  The result of this attempt 
is the Counselor Verbal Response Mode Category System 
(CVRMCS).  Five stages of development were needed to 
obtain the final categories used for ratings.  Throughout 
its development, the same two raters were used to help 
identify important and reliable rating categories.  During 
the first stage, 25 categories taken from the existing 
rating literature were used to rate two practice sessions. 
 Discussion resulted in revising some categories to reduce 
overlap.  During the second stage, 24 categories were used 
to rate five practice sessions.  During these first two 
stages, interrater reliability remained low.  Further 
discussion and ratings in the third version of the system 
resulted in interrater reliability on two practice 
sessions at 80% and 90% agreement.  

	Face validity was then tested by asking three experienced 
counseling psychologists to match examples of the various 
categories with the appropriate definitions.  Only half of 
the examples were matched, leading to a reexamination and 
clarification of the existing 24 categories.  This fourth 
version, now with only seventeen categories, was given to 
another panel of three experienced counseling 
psychologists, two of whom were able to obtain 80% 
agreement on matching definitions with the appropriate 
examples.  The fifth and final revision used for Hill's 
(1978) initial study was just a reworded and clarified 
version of the fourth version.  This revision contained 
the following categories:  minimal encourager, approval- 
reassurance, structuring, information, direct guidance, 
closed question, open question, restatement, reflection, 
nonverbal referent, interpretation, confrontation, 
self-disclosure, silence, friendly discussion, criticism, 
and unclassifiable (Hill, 1978).

	The CVRMCS, like Stiles' VRM system, uses complete and 
accurate transcripts of therapy sessions as its primary 
rating material.  Transcripts are divided into what Hill 
(1978) terms "response units (essentially grammatical 
sentences)," (p. 463) which include brief phrases such as 
"mmhmm" and "yes."  (Three years later, Hill better 
defined this unit as any independent clause [cited in 
Friedlander, 1982].)  Raters independently listened to 
therapy tapes, which consisted of 12 intake sessions (as 
opposed to therapy sessions, which also can be used), and 
followed along in a unitized transcript, rating each 
response unit according to one of the 17 categories.  Each 
response unit could be placed into one or more of the 
categories.  Disagreements between raters were resolved 
using a procedure similar to the the consensus team method 
used in the VPPS, in which discussion was used to reach an 
unanimous agreement for those items that were discrepant.

	At the conclusion of Hill's (1978) initial study, she 
determined that there were 14 statistically significant, 
mutually exclusive categories.  Minimal encourager 
describes an acknowledgment, simple agreement, or 
Approval-reassurance provides emotional support, approval, 
or reinforcement.  Information describes information 
usually taking the form of facts, data, or resources that 
is supplied.  Direct guidance consists of directions or 
advice that the therapist gives to the client.  Closed 
question is a type of question that usually only requires 
a one- or two-word answer, such as yes or no.  Open 
question is a type of question which requests a 
clarification of feelings or an exploration of some 
situation.  Restatement describes a simple restating or 
rephrasing of the client's statement which often contains 
similar but fewer words and is more concrete and clear 
than the original statement.  Reflection is a simple 
restating or rephrasing of the client's statement which 
contains reference to stated or implied feelings.  
Nonverbal referent points out body posture, voice tone or 
level, facial expressions, etc.  Interpretation may take 
several forms, but always goes beyond what the client has 
stated.  For instance, it might establish connections 
between seemingly unrelated events or statements; it 
interprets defenses, feelings, resistance, or 
transference; it might indicate themes, patterns, or 
causal relationships in the client's behavior.  
Confrontation is defined by two parts: the first part may 
be implied rather than stated and refers to some aspect of 
the client's message or behavior; the second part usually 
begins with the word "but" and presents a discrepancy or 
contradiction.  Self-disclose describes a statement in 
which the therapist shares his or her own personal 
experiences or feelings with the client and usually begins 
with the word "I."  Silence is a pause of five seconds or 
more.  Other describes statements that are unrelated to 
the client's problems, such as small talk and salutations 
(Hill, 1978).

	Friedlander (1982) refined Hill's (1978) rating system by 
examining some of the most prominent problems with the 
CVRMCS.  Two major problem areas were identified: the 
mixture of classical and pragmatic coding categories (as 
defined by Russell & Stiles, 1979) within the 14 
categories used, and the inconsistency of the definition 
of the response unit (Friedlander, 1982).  Friedlander 
(1982), using interrater discrepancies and face and 
content validity tests similar to Hill's (1978), combined 
a number of redundant categories, resulting in nine 
mutually exclusive categories (CVRMCS-R).  Those 
categories are: encouragement/approval/ reassurance, 
reflection/restatement, self-disclosure, confrontation, 
interpretation, providing information, information 
seeking, direct guidance/advice, and unclassifiable.  The 
scoring unit was also redefined to include any dependent 
or independent clause that at the minimum, contained a 
verb phrase.  Compound predicates also constituted 
individual units.  Ratings again were conducted from 
unitized transcripts and disagreements between independent 
raters were handled in the same manner as Hill (1978).

	Rater qualifications were not initially specified in 
either of the above systems.  Two undergraduate students 
majoring in psychology and a counseling psychologist 
(Hill) were used as raters in Hill's (1978) study; 
interrater reliability was reported at .80.  Two raters 
were used for Friedlander's (1982) updated system; 
interrater reliability was reported as .85.  Elliott et 
al. (1987) found interrater reliabilites for Hill's system 
to range from .48 to .94, averaging .64 and for 
Friedlander's system from .32 to .82, averaging .57.  Hill 
(1986) later did note selection criteria and training for 
raters; raters were selected on the basis of "a high 
grade-point average, motivation, and ability to do the 
task" (p. 140).  Training required raters to become 
familiar with the rating categories and then practice with 
the system until at least two out of the three raters 
agreed on 75-80% of all categories (Hill, 1986).  As with 
the other rating systems described here, a rater training 
manual is available.  	

	Hill (1986) also suggests weekly meetings amongst raters 
to correct for rater drift, provide an opportunity for 
affiliation to reduce boredom and loneliness, and 
reconcile disagreements.  Judgments of two out of three of 
the raters are usually accepted without discussion; when 
all three raters disagree on a category, discussion 
ensues.  To ensure that no one rater dominates or 
influences the discussion process, Hill (1986) suggests 
alternating which person talks first and allow equal time 
and respect to each rater's opinions and reasons for 

	Since 1987, only a handful of studies have utilized the 
CVRMCS taxonomy in research.  Cummings (1989) used the 
system to discover that novice counselors used more 
information-oriented responses when addressing a 
help-seeking individual with intrapersonal problems (such 
as procrastination, loneliness, etc.) and used more 
reflection-oriented responses when an individual presented 
with interpersonal problems (such as dealing with 
conflicts or relationships with other people in that 
person's life).  Other studies have dealt with changes in 
graduate students after taking a course devoted to 
developing counseling skills (Kivilghan, 1989) and an 
overview of effective therapist techniques by Hill (1992). 
 There is a larger base of studies conducted before 1987 
that use the CVRMCS to examine effectiveness of various 
theoretical orientations in therapy (cited in Hill, 1986). 
 Unfortunately, most of these rated only the initial 
intake session, with only a few examining response modes 
across the course of treatment (Hill, 1986). 

	While not as complicated as Stile's VRM system, nor as 
content- or dynamically-oriented as the VPPS, neither the 
CVRMCS nor Friedlander's refinement of the CVRMCS rating 
system were adequate for the present study, for two 
important reasons.  First, the CVRMCS is a 
counselor-oriented rating system and does not include 
ratings of client responses (Hill developed a similar, yet 
separate system for rating client responses [Hill, 1986]). 
 Second, like the VRM system, the CVRMCS highly recommends 
that verbatim transcripts of therapy session be used.  The 
present researchers had limited time and resources 
available and could not provide such unitized transcripts 
for this study.  

Cognitive Elaboration Rating System (CERS)

	Categorizing talking turns in therapy sessions is a 
difficult undertaking.  Past studies have illustrated 
problems in attempting to define and rate therapist and 
client interactions.  For instance, interrater reliability 
in such studies have averaged only in the high .50's 
(Elliott, et al., 1987).  When examining the most reliable 
rating categories, questions (.71) were followed by 
advisement (.66), information (.64), and self-disclosure 
(.61).  These researchers concluded that there was no 
"best" response-mode rating system, and that researchers 
should use a rating system best suited to their own 
particular needs (Elliott, et al., 1987).

	The Cognitive Elaboration Rating System (CERS) was 
developed as an extension of a literature review on the 
development and current research of the elaboration 
likelihood model (Petty & Cacioppo, 1986).  The 
elaboration likelihood model (ELM) states, in general, 
that when clients expand or elaborate on positive 
experiences, thoughts, or emotions on their own during a 
therapy session, they are more likely to gain positive 
outcomes from therapy.  Successful outcomes in ELM can be 
best measured by examining the therapeutic process.  

	The positive-outcome model of cognitive elaboration 
theorizes a particular series of events in therapy.  
First, the client's negative or problem thoughts, ideas, 
or cognitions are identified and examined in therapy.  The 
therapist then targets those faulty thoughts that are most 
likely to be susceptible to change and begins a series of 
interventions designed to promote new, healthier or more 
adaptive cognitions.  By presenting favorable, strong 
arguments for these new cognitions, the therapist 
encourages the client to begin elaborating on them, 
through verbal and imagery strategies.  Short-term change 
is then likely to take place, at which time clients may 
exhibit actual or superficial change, which must be 
accurately assessed.  Finally, by incorporating the new 
constructive thoughts into their own cognitive schema, the 
client maintains changes and begins to generalize the 
changes to other ideas and behaviors (Barone & Hutchings, 

	The current researchers were interested in examining the 
client's cognitive elaborations as a result of the 
therapist's arguments for change.  It would be informative 
to discover, for example, whether certain types of 
therapist responses would be more likely to elicit 
positive client elaborations.  By incorporating a number 
of pre-treatment and post-treatment outcome measures, it 
could then be determined whether greater cognitive 
elaboration in therapy sessions contributes to more 
positive therapy outcomes and, more specifically, the type 
of therapist interventions that result in greater amounts 
of positive client cognitive elaborations.

	Numerous studies have been conducted that support the 
above general hypothesis, but none so far have been 
completed on a clinical population (Barone & Hutchings, 
1993).  Such studies are likely to be large undertakings 
that require analyzing dozens of individual therapy 
sessions of different clients and therapists in a clinical 
setting.  A rating system would need to be developed to 
address the specific research issues unique to studying 
cognitive elaborations within therapy sessions.
Initial Development of CERS

	The Cognitive Elaboration Rating System (CERS) is the 
direct product of those needs.  CERS was developed over 
approximately a one-year period of time.  The researchers 
first modeled a response mode rating system based upon the 
findings of Elliott, et al. (1987) and the basic 
theoretical underpinnings of ELM (Petty & Cacioppo, 1986). 
 This experimental rating system eventually contained 
fifteen categories - nine for coding therapist 
communications and six for rating client communications 
(Table 3).  Three of the six client categories were also 
rated according to whether they were positively, 
neutrally, or negatively goal-directed, according to each 
client's individual treatment plan.  It was hypothesized 
that by noting whether certain interventions in therapy 
were followed by either positive or negative cognitive 
elaborations, the impact of such interventions could be an 
important factor for measuring treatment efficacy.  Hence, 
for example, if a therapist who asked an abundance of 
open-ended questions continually received positive 
self-disclosures in response to his or her questions from 
different clients across the spectrum, this would be an 
important variable to note.  

Table 3.  
Original experimental categories for CERS

Therapist           Client
-----------------   -----------------------
Confrontation       Compliance (+, -, N) 
Information         Resistance (+, -, N) 
Advisement          Self-disclosure (+, -, N) 
Interpretation      Reflection 
Self-disclosure     Question 
Reflection          Other (Inaudible) 
Other (Inaudible)                                          

	The units of measurement in this initial study were 
complete talking turns between therapist and client.  Each 
talking turn was rated according to the speaker's 
categories, while minimal encouragements (such as, "Mmhm") 
were ignored.  Talking turns could receive more than one 
rating, which occurred most often during long stretches of 

	Ratings were drawn from audiotaped therapy sessions of 
clients being seen in a local, suburban community mental 
health center.  Clients carried an Axis I or Axis II 
mental disorder diagnosis from the Diagnostic and 
Statistical Manual of Mental Disorders - 3rd Edition, 
Revised (APA, 1987) and were generally of lower 
socio-economic status.  Therapists were doctoral-level 
clinical psychology students operating in a general adult 
outpatient psychotherapy program and were supervised by a 
licensed clinical Ph.D. psychologist.  Therapists 
represented the entire continuum of therapeutic 
orientations and interventions currently practiced.  
Initially, ratings were completed for the entire 45-minute 
audiotape.  While the system was under development, raters 
- two doctoral-level students in a clinical psychology 
program and two clinical Ph.D. psychologists - 
independently rated each session according to the initial 
CERS categories.  Treatment plans describing specific 
treatment goals for each client were provided and reviewed 
at the onset of each rating session.  After approximately 
20 to 25 talking turns had been completed, the audiotape 
was stopped and ratings among the four individuals were 
compared.  Discussion resulting in a consensus among the 
raters helped clarify discrepancies between rating 
categories as the study progressed.

 	After approximately six months of developing this 
initial cognitive elaboration rating system, it was 
completely discarded for a number of reasons.  First, it 
was discovered to be inefficient for examining therapy 
sessions for cognitive elaborations and the context in 
which they occur.  That is, more ratings were performed 
than were necessary for the detection of cognitive 
elaboration and the therapist's prompting of client 
elaboration.  Second, adequate interrater reliability 
among the researchers using this system was never 
obtained, and varied widely depending upon the complexity 
of the session being rated (r = .20 to .78, averaging 
.53).  Discussions regarding discrepancies among raters 
occurred very frequently, even after the rating categories 
had been defined and used for months by the same set of 
raters.  Third, the rating system required that raters 
remember the client's treatment goals for therapy 
throughout the entire rating period.  These goals were 
developed by the therapist and client without regard to 
research concerns.  Consequently, ratings were often open 
to interpretation as to whether a client's responses were 
positively, negatively, or neutrally goal-oriented. 

Final Version of CERS

	The response mode rating system which rated verbal 
talking turns in therapy was exchanged for a rating system 
which coded the content of therapists' and clients' 
responses during 3-minute time intervals.  Although a 
time-interval mode was judged to be less sensitive than a 
response mode rating system, it was believed that it would 
be accurate and sensitive enough to determine whether 
cognitive elaboration took place and during which types of 
client-therapist interactions.  

	At the same time, new categories were developed to 
simplify rating tasks and increase interrater agreement.  
These categories were: old or existing experiences, 
emotions, or beliefs/ideas; and, new or developing 
experiences, emotions, or beliefs/ideas (Table 4).  A 
rating worksheet was used and during each 3-minute 
interval, raters were instructed to simply check off the 
occurrence of one of the above categories.  Face validity 
for these categories was never obtained.

Table 4.  
Final categories for CERS

Therapist          	Client                              
----------------    -----------------
Old/existing: 	    Old/existing: 
    Experiences         Experiences 
    Emotions            Emotions 
    Beliefs/ideas       Beliefs/ideas 
New/developing:     New/developing: 
    Experiences         Experiences 
    Emotions            Emotions 
    Beliefs/ideas       Beliefs/ideas 

	At the end of each 15-minute interval (or five 3-minute 
intervals, as the worksheet was designed [Appendix B]), 
raters halted the audiotape and also rated the following 
five categories on a 7-point Likert-type scale: client's 
agreement with therapist, client's amount of elaboration, 
polarity of client's elaboration, therapist's persuasive 
attempts, and therapist's prompting of elaboration.  
Client's agreement with therapist refers to the amount of 
agreement or disagreement the client had with the 
therapist during this segment.  Client's amount of 
elaboration refers to the amount of elaborations the 
client made compared to the overall percentage of time the 
client spoke and compared with all other clients.  
Polarity of client's elaboration refers to whether the 
client's elaborative attempts were mostly negative, 
neutral, or positive in the 15-minute interval.  
Therapist's persuasive attempts refers to how much the 
therapist attempted to persuade the client as compared to 
the overall percentage of time the therapist spoke and 
compared with all other therapists.  Therapist's prompting 
of elaboration refers to the amount of elaboration the 
therapist prompted in the client as compared to the 
overall percentage of time the therapist spoke and 
compared to all other therapists.

	While the three previous raters (one of the Ph.D. 
psychologists discontinued rating materials) independently 
rated the same audiotapes used in the earlier rating 
system, one of the raters would keep track of the time.  
When three minutes had passed, the audiotape was stopped 
and ratings for that 3-minute segment were compared.  
Discussion ensued over disagreements among the ratings and 
final ratings were based on a consensus among the raters.

	During these discussions, it was discovered that a number 
of rating categories were difficult to distinguish and 
were sometimes misunderstood.  It was at this point that a 
rater's training manual was begun, to record rules that 
were developed for these difficult decisions (Appendix B). 
 Rules were developed from criteria discussed amongst the 
raters and based upon pragmatic concerns.  For instance, 
it was decided that small talk that often occurred at the 
beginning of many therapy sessions could be ignored, since 
it involved client-therapist interchanges that were of 
little therapeutic value.  The manual also provides an 
overview of the rating system, explains each of the rating 
categories (including the five Likert-type scales), and 
describes in detail each of the eight rules developed to 
distinguish especially difficult rating situations.  
Raters were encouraged to familiarize themselves with this 
manual and review it prior to each rating session.

	When it became evident that this system had much greater 
interrater reliability (averaging 83% agreement), the 
initial interrater reliability study was initiated.  This 
study followed essentially the same format as described 
above, with a few minor changes.  Only two raters were 
used, one female doctoral-level student previously 
unrelated to the study and one male doctoral-level student 
who had been one of the raters throughout the development 
of the CERS.  An audiotape with a tone sounding every 
three minutes was made to free the raters from the 
additional task of time-keeping.  It was determined 
pragmatically that the best sessions to rate for each 
client would be the third session, some mid-point session, 
and the last session of therapy.  This interrater 
reliability study was conducted and the details of this 
study, as well as its results, will be presented in an 
independent paper.

	Raters were selected independent of their theoretical 
orientations or personality characteristics.  Due to 
practical limitations, raters selected for the initial and 
final versions of the CERS were always doctoral-level 
students and the researchers themselves.  No effort was 
made to prescreen the desirability of raters based upon a 
theoretical understanding of ELM, cognitive elaboration, 
or therapy experience; all of the current raters, however, 
did have at least minimal therapy experience.  Moras & 
Hill (1991) suggest that raters be selected based upon 
preset criteria established by individual researchers and 
that such criteria be noted.  The criteria used for the 
present study included the attainment of a preset 
interrater reliability statistic among raters (r = .85 or 
better) and a motivation to complete the task.

	The CERS is unique in its unit length.  Whereas most 
other systems define the unit as either an utterance or a 
response unit, the CERS uses a time-interval unit of every 
3 minutes within three complete therapy sessions.  The 
VPPS has been used in studies with varying unit lengths, 
from 10 to 15 minutes, to the entire therapy hour.  
Researchers currently recommend 15-minute unit lengths 
systematically sampled throughout a therapy session (Suh 
et al., 1989).  Also like the VPPS, the CERS utilizes 
Likert-type scales for some of its ratings, and uses 
15-minute unit segments.

	The development of the CERS somewhat parallels the 
construction of the other systems presented here.  CERS 
was developed and refined over a period of time in which 
categories were fully explored and defined, while ratings 
were discussed at great length amongst the system's 
authors.  This method was similar to Hill's (1978) 
development of the CVRMCS, which also underwent different 
stages of development and refinement (culminating in 
Friedlander's 1982 revision of the original system).  
Other systems began from more stable roots and have 
changed only slightly since their introduction, including 
the VPPS, which is only in its second full version, and 
the VRM system, which remains largely unchanged since 1976 
(Stiles, 1992).  

	Once the reliability and validity of the CERS has been 
established, researchers can begin examining the purpose 
for which CERS was constructed.  That is, namely, to 
examine the existence of cognitive elaborations in therapy 
and under which types of therapist interventions positive 
and negative client elaborations occur.  With a system 
sensitive enough to distinguish such interventions in 
therapy, yet pragmatic enough to train raters on it 
quickly and reliably, a large research study can be 
undertaken with a clinical population.  It must be 
emphasized that the final version of CERS has not yet been 
determined to be sensitive enough to distinguish the 
therapist interventions which elicit greater client 
cognitive elaborations.  The therapeutic process is a 
dynamic process that includes client personality 
variables, nonverbal behaviors, often dissimilar 
intentions between the therapist and client, and 
situational variables, all of which must be taken into 
account when examining therapeutic outcomes.  CERS is just 
one small part of understanding the entire therapeutic 
process, but may prove extremely useful in this 


American Psychiatric Association. (1987). Diagnostic and 
	statistical manual of mental disorders (3rd ed., rev.). 
	Washington, DC: Author.

Barone, D.F., & Hutchings, P.S. (1993). Cognitive elaboration: 
	Basic research and clinical application. Clinical Psychology 
	Review, 13, 187-201.

Cummings, A.L. (1989). Relationship of client problem type to 
	novice counselor response modes. Journal of Counseling 
	Psychology, 36, 331-335.

Elliott, R., Hill, C.E., Stiles, W.B., Friedlander, M.L., 
	Mahrer, A.R., & Margison, F.R. (1987). Primary therapist 
	response modes: Comparison of six rating systems. Journal of 
	Consulting and Clinical Psychology, 55, 218-223.

Friedlander, M.L. (1982). Counseling discourse as a speech event: 
	Revision and extension of the Hill Counselor Verbal Response 
	Category System. Journal of Counseling Psychology, 29,425-429.

Garfield, S.L., & Bergin, A.E. (Eds.). (1986). Handbook of 
	psychotherapy and behavior change. New York: Wiley & Sons.

Gomes-Schwartz, B. (1978). Effective ingredients in psychotherapy: 
	Prediction of outcome from process variables. Journal of 
	Consulting and Clinical Psychology, 46, 1023-1035.

Gomes-Schwartz, B. & Schwartz, J.M. (1978). Psychotherapy process 
	variables distinguishing the "inherently helpful" person from 
	the professional psychotherapist. Journal of Consulting and 
	Clinical Psychology, 46, 196-197.

Greensberg, L.S., & Pinsof, W.M. (Eds.). (1986). The 
	psychotherapeutic process: A research handbook. New York: 
	Guilford Press.

Hill, C.E. (1978). Development of a Counselor Verbal Response 
	Category System. Journal of Counseling Psychology, 25, 461-468.

Hill, C.E. (1986). An overview of the Hill Counselor and Client 
	Verbal Response Modes Category Systems. In L.S. Greenberg & 
	W.M. Pinsof (Eds.), The psychotherapeutic process: A research 
	handbook (pp. 131-160). New York: Guilford.

Hill, C.E. (1992). Research on therapist techniques in brief 
	individual therapy: Implications for practioners. Counseling 
	Psychologist, 20, 689-711.

Kivlighan, D.M. (1989). Changes in counselor intentions and 
	response modes and in client reactions and session evaluation 
	after training. Journal of Counseling Psychology, 36, 471-476.

Lindner, R. (1954). The fifty-minute hour. New York: Delta.

Moras, K., & Hill, C.E. (1991). Rater selection for psychotherapy 
	process research: An evaluation of the state of the art. 
	Psychotherapy Research, 1, 113-123.

O'Malley, S.S., Suh, C.S., & Strupp, H.H. (1983). The Vanderbilt 
	Psychotherapy Process Scale: A report on the scale development 
	and a process-outcome study. Journal of Consulting and Clinical 
	Psychology, 51, 581-586.

Petty, R.E., & Cacioppo, J.T. (1986). Communication and persuasion: 
	Central and peripheral routes to attitude change. New York: 

Russell, R., & Stiles, W. (1979). Categories for classifying 
	language in psychotherapy. Psychological Bulletin, 86, 406-419.

Stiles, W.B. (1978). Manual for a taxonomy of verbal response modes. 
	Chapel Hill: Institute for Research in Social Science, 
	University of North Carolina at Chapel Hill.

Stiles, W.B. (1992). Describing talk: A taxonomy of verbal 
	response modes. London: Sage Publications.

Storr, A. (1990). The art of psychotherapy (2nd ed.). New York: Routledge.

Strupp, H.H. (1957). A multidimensional system for analyzing 
	psychotherapeutic techniques. Psychiatry, 20, 293-306.

Suh, C.S., O'Malley, S.S., Strupp, H.H., & Johnson, M.E. (1989). 
	The Vanderbilt Psychotherapy Process Scale (VPPS). Journal of 
	Cognitive Psychotherapy: An International Quarterly, 3, 123-154.

Windholz, M.J., & Silberschatz, G. (1988). The Vanderbilt 
	Psychotherapy Process Scale: A replication with adult 
	outpatients. Journal of Consulting and Clinical Psychology, 56, 56-60.

Last reviewed: By John M. Grohol, Psy.D. on 22 Jul 2016
    Published on All rights reserved.