Classical Texts in Psychology
(Return to Index)
[Editor's Note: Footnotes are in square brackets; references in round brackets]
TWO TYPES OF CONDITIONED REFLEX : A REPLY TO KONORSKI AND MILLER
B.F. Skinner (1937)
Before considering the specific objections raised by Konorski and Miller(4) against my formulation of a second type of conditioned reflex, I should like to give a more fundamental characterization of both types and of the discriminations based upon them.
Let conditioning be defined as a kind of change in reflex strength where the operation performed upon the organism to induce the change is the presentation of a reinforcing stimulus in a certain temporal relation to behavior. All changes in strength so induced come under the head of conditioning and are thus distinguished from changes having similar dimensions but induced in other ways (as in drive, emotion, and so on). Different types of conditioned reflexes arise because a reinforcing stimulus may be presented in different kinds of temporal relations. There are two fundamental cases: in one the reinforcing stimulus is correlated temporally with a response and in the other with a stimulus. For "correlated with" we might write "contingent upon". There are the types that I have numbered I and II respectively. Konorski and Miller refer to the second as Type I and to a complex case involving the first (see below) as Type II. To avoid confusion and to gain a mnemonic advantage I shall refer to conditioning which results from the contingency of a reinforcing stimulus upon a stimulus [p.273] as a Type S and to that resulting from contingency upon a response as of Type R.
If the stimulus is already correlated with a response or the response with a stimulus, a reinforcement cannot be made contingent upon the one term without being put into a similar relation with the other. That is to say, if a reinforcing stimulus is correlated temporally with the S in a reflex, it is also correlated with the R, or if with the R, then also with the S. It is not possible to avoid this difficulty (which seems to destroy the validity of the foregoing definition) by specifying a kind of temporal relation. If, for example, we should distinguish between the cases in which the reinforcing stimulus precedes S (and hence also precedes R) and those in which it follows R (and hence also follows S), the resulting classes would be close to those of Types R and S but they would not be identical with them, and the basis for the definition would not permit a deduction of the other characteristics of the types. The contingency of the reinforcing stimulus upon a separate term is necessary.
It may be noted, therefore, that in both paradigms of conditioning as
previously given (2) the connection between the term
to be correlated with the reinforcing stimulus and another term is irrelevant.
No connection need exist at the start. In Type S we may use a stimulus
( So ) eliciting no observable response and in Type R
a response ( Ro ) elicited by no observable
stimulus (for example, the "spontaneous" flexion of a leg). Or, if a connection
originally exists, it may disappear during conditioning. In Type S,
if So elicits a definite response [ say, where ( So
- Ro ) is (shock - flexion) ], Ro may
disappear (Eroféeva); and in Type R, if Ro is
apparently elicited by a definite stimulus [say, where ( So -
Ro ) is the same], Ro will eventually
appear without So, as Konorski and Miller have shown.
The paradigms may therefore be rewritten as follows:
where the arrows indicate the temporal correlation responsible for conditioning, and where the terms written in lower case either (a) cannot be identified, (b) may be omitted, or (c) may disappear. The correlation of the reinforcing stimulus with a separate term is here achieved and from [p.274] it the properties of two (and, incidentally, only two) types of conditioned reflex may be deduced. The differences between the types given in my paper (2), which need not be repeated here, are no longer useful in defining the types, but they serve as convenient hallmarks.
This solution depends upon the statement that there are responses uncorrelated with observable stimuli - a statement that must not be made lightly but cannot, so far as I can see, be avoided. It is a necessary recognition of the fact that in the unconditioned organism two kinds of behavior may be distinguished. There is, first, the kind of response that is made to specific stimulation, where the correlation between response and stimulus is a reflex in the traditional sense. I shall refer to such a reflex as a respondent and use the term also as an adjective in referring to the behavior as a whole. But there is also a kind of response which occurs spontaneously in the absence of any stimulation with which it may be specifically correlated. We need not have a complete absence of stimulation in order to demonstrate this. It does not mean that we cannot find a stimulus that will elicit such behavior but that none is operative at the time the behavior is observed. It is the nature of this kind of behavior that it should occur without an eliciting stimulus, although discriminative stimuli are practically inevitable after conditioning. It is not necessary to assume specific identifiable units prior to conditioning, but through conditioning they may be set up. I shall call such a unit an operant and the behavior in general, operant behavior. The distinction between operant and respondent behavior and the special properties of the former will be dealt with at length in a work now in preparation. All conditioned reflexes of Type R are by definition operants and all of Type S, respondents; but the operant-respondent distinction is the more general since it extends to unconditioned behavior as well.
A formulation of the fundamental types of discrimination may also be carried out in terms of the contingency of the reinforcing stimulus. Discrimination differs from conditioning because the existing correlation cannot be unequivocally established with any one set of properties of the stimulus or response. The effect of a given act of reinforcement is necessarily more extensive than the actual contingency implies, and the relation must be narrowed through extinction with respect to the properties not involved in the correlation. There are three basic types of discrimination.
1. Discrimination of the Stimulus in Type S.
a. S1 is contingent upon less than all the aspects or properties of So present upon any given occasion of reinforcement. For example, let S1 be contingent upon the pitch of a tone. Before this relation (and not merely a relation between the response and the tone itself) can be established, re- [p.275] sponses to tones of other pitches that have been conditioned through induction must be extinguished.
b. S1 is contingent upon a group of stimuli but not upon subgroups or supergroups. For example, let SA and SB be reinforced together but not separately. Before the relation will be reflected in behavior, the responses to either stimulus alone that are strengthened through induction must be extinguished.
2. Discrimination of the Stimulus in Type R. S1 is contingent upon Ro in the presence of a stimulus SD. For example, let the pressing of a lever be reinforced only when a light is on. Before this relation can be established in the behavior, the responses in the absence of the light developed through induction from the reinforcement in the presence of the light must be extinguished (3).
3. Discrimination of the Response in Type R. S1 is contingent upon an Ro having a given value of one or more of its properties. For example, let S1 be contingent upon a response above a given level of intensity. Responses of lower intensity strengthened through induction must be extinguished.
(There is no fourth case of a discrimination of the response in Type S.) Both discriminations of the stimulus (but not that of the response) yield what I have called pseudo-reflexes, in which stimuli are related to responses in ways that seem to resemble reflexes but require separate formulations if confusion is to be avoided.. In Type S (Case b, above), given the organism in the presence of SA, the presentation of SB will be followed by a response. The superficial relation ( SB - R ) is not a reflex, because the relevance of SA is overlooked. Similarly in Type R, the superficial relation between the light and pressing the lever is not a reflex and exhibits none of the properties of one when these are treated quantitatively.
The distinction between an eliciting and a discriminative stimulus was not wholly respected in my earlier paper, for the reflex (lever-pressing) was pseudo. As a discriminated operant the reflex should have been written ( s + lever - pressing). Since I did not derive the two types from the possible contingencies of the reinforcing stimulus, it was not important that Ro in Type R be independent of an eliciting stimulus. But the treatment of the lever as eliciting an unconditioned response has proved inconvenient and impracticable in other ways, and the introduction of the notion of the [p.276] operant clears up many difficulties besides those immediately in question. It eliminates the implausible assumption that all reflexes ultimately conditioned according to Type R may be spoken of as existing as identifiable units in unconditioned behavior and substitutes the simpler assumption that all operant responses are generated out of undifferentiated material. Certain difficulties in experiments upon operants are also avoided. Operant behavior cannot be treated with the technique devised for respondents (Sherrington and Pavlov), because in the absence of an eliciting stimulus many of the measures of reflex strength developed for respondents are meaningless. In an operant there is properly no latency (except with respect to discriminative stimuli), no after-discharge, and most important of all no ratio of the magnitudes of R and S. In spite of repeated efforts to treat it as such, the magnitude of the response in an operant is not a measure of its strength. Some other measure must be devised, and from the definition of an operant it is easy to arrive at the rate of occurrence of the response. This measure has been shown to be significant in a large number of characteristic changes in strength.
There is thus an important difference between the Konorski and Miller sequence "shock - flexion -> food" and the sequence "s + lever - pressing -> food." The first contains a respondent, the second an operant. The immediate difference experimentally is that in the second case the experimenter cannot produce the response at will but must wait for it to come out. A more important difference concerns the basis for the distinction between two types. Since there is no eliciting stimulus in the second sequence, the food is correlated with the response but not with the lever as a stimulus. In the first sequence the food is correlated as fully with the shock as with the flexion. The Konorski and Miller case does not fit the present formula for Type R, and a divergent result need not weight against it. The case does not, as a matter of fact, fit either type so long as the double correlation with both terms exists. Conditioning of Type S will occur (the shock-salivation reflex of Eroféeva), but there is no reason why conditioning of Type R should occur so long as there is a correlation between the reinforcement and an eliciting stimulus. Nothing is to be gained in such a case; the original sequence operates as efficiently as possible.
The case comes under Type R only when the correlation with So is broken up - that is, when a response occurs that is not elicited by So. The complex experiment described by Konorski and Miller may be formulated as follows: In the unconditioned organism there is operant behavior that consists of flexing the leg. It is weak and appears only occasionally. There is also the strong respondent (shock - flexion), which has more or less the same form of response. In Konorski and Miller's experiment we may assume that an elicitation of the respondent ( S - R ) brings out at the same time the operant ( s - R ), which sums with it. We have in reality two sequences: [p.277] ( A ) shock - flexion, and ( B ) s - flexion -> food. Here the respondent ( A ) need not increase in strength but may actually decrease during conditioning of Type S, while the operant in B increases in strength to a point at which it is capable of appearing without the aid of A. As Konorski and Miller note, "...the stimulus So [shock] plays only a subsidiary role in the formation of a conditioned reflex of the new type. It serves only to bring about the response Ro [by summation with the operant?], and once the connection SG - Ro is established [read: 'once the operant is reinforced'], it loses any further experimental significance (4)."
The existence of independent composite parts may be inferred from the facts that B eventually appears without A when it has become strong enough through conditioning, and that it may even be conditioned without the aid of A although less conveniently.
Konorski and Miller seem to imply that a scheme that appeals to the spontaneous occurrence of a response cannot be generally valid because many responses never appear spontaneously. But elaborate and peculiar forms of response may be generated from undifferentiated operant behavior through successive approximation to a final form. This is sometimes true of the example of pressing the lever. A rat may be found (very infrequently) not to press the lever spontaneously during a prolonged period of observation. The response in its final form may be obtained by basing the reinforcement upon the following steps in succession: approach to the site of the lever, lifting the nose into the air toward the lever, lifting fore-part of body into the air, touching lever with feet, and pressing lever downward. When one step has been conditioned, the reinforcement is withdrawn and made contingent upon the next. With a similar method any value of a single property of the response may be obtained. The rat may be conditioned to press the lever with a force equal to that exerted by, say, 100 grams (although spontaneous pressings seldom go above 20 grams) or to prolong the response to, say, 30 seconds (although the lever is seldom spontaneously held down for more than two seconds). I know of no stimulus comparable with the shock of Konorski and Miller that will elicit "pressing the lever" as an unconditioned response or elicit it with abnormal values of its properties. There is no So available for eliciting these responses in the way demanded by Konorski and Miller's formulation.
Where eliciting stimulus is lacking, Konorski and Miller appeal to "putting through". A dog's paw is raised and placed against a lever, and this "response" is reinforced with food. Eventually the dog makes the response spontaneously. But a great deal may happen here that is not easily observed. If we assume that tension from passive flexion is to some extent negatively reinforcing, anything that the dog does that will reduce the tension will be reinforced as an operant. Such a spontaneous response as moving the foot in the direction of the passive flexion will be reinforced. [p.278] We thus have a series of sequences of this general form:
s + SD (touching and flexing of leg) - R (movement of leg in certain direction) -> S1 (relief of tension).
The effect of "putting through" is to provide step-by-step reinforcement for many component parts of the complete response, each part being formulated according to Type R. The substitution of food as a new reinforcement is easily accounted for.
This interpretation of "putting through" is important because Konorski and Miller base their formulation of the new type upon the fact that proprioceptive stimulation from the response may become a conditioned stimulus of Type S since it regularly precedes S1. One of the conditions for this second type is that "the movement which constitutes its effect is a conditioned food stimulus." That conditioning of this sort does take place during conditioning of Type R was noted in my paper, but its relevance in the process of Type R does not follow. Perhaps the strongest point against it is the fact that conditioning of Type R may take place with one reinforcement, where a prerequisite conditioning of Type S could hardly have time to occur. Any proprioceptive stimulation from Ro acts as an additional reinforcement in the formula for Type R. Where it is possible to attach conditioned reinforcing value to a response without eliciting it, the reinforcement is alone in its action, but the case still falls under Type R. In verbal behavior, for example, we may give a sound reinforcing value through conditioning of Type S. Any sound produced by a child which resembles it is automatically reinforced. The general formula for cases of this sort is s - Ro -> stimulation from Ro acting as a conditioned reinforcement.
I assume that this is not a question of priority. The behavior characteristic of Type R was studied as early as 1898 (Thorndike). The point at issue is the establishment of the most convenient formulation, and I may list the following reasons for preferring the definition of types given herein. (1) A minimal number of terms is specified. This is especially important in Type R, which omits the troublesome So of Konorski and Miller's formula. (2) Definition is solely in terms of the contingency of reinforcing stimuli - other properties of the types being deduced from the definition. (3) No other types are to be expected. What Konorski and Miller give as variants or predict as new types are discriminations. (4) The distinction between an eliciting and a discriminative stimulus is maintained. Konorski and Miller's variants of their Type II are pseudo-reflexes and cannot yield properties comparable with each other or with genuine reflexes.
Two separate points may be answered briefly. (1) It is essential in this kind of formulation that one reflex be considered at a time since our data have the dimensions of changes in reflex strength. The development of an [p.279] antagonistic response when a reinforcement in Type R is negative requires a separate paradigm, either of Type R or Type S. (2) That responses of smooth muscle or glandular tissues may or may not enter into Type R, I am not prepared to assert. I used salivation as a convenient hypothetical instance of simultaneous fused responses of both types, but a skeletal response would have done as well. The child that has been conditioned to cry "real tears" because tears have been followed by positive reinforcement (e.g., candy) apparently makes a glandular conditioned response of Type R, but the matter needs to be checked because an intermediate step may be involved. Such is the case in the Hudgins' experiment (1), where the verbal response "contract" is an operant but the reflex ("contract" - contraction of pupil) is a conditioned respondent. The question at issue is whether we may produce contraction of the pupil according to s - contraction -> reinforcement, where (for caution's sake) the reinforcing stimulus will not itself elicit contraction. It is a question for experiment.
 Not all pseudo-reflexes are discriminative, if we extend the term to include all superficial correlations of stimulus and response. For example, let a tetanizing shock to the tail of a dog be discontinued as soon as the dog lifts its left foreleg. The discontinuance of a negative reinforcement acts as a positive reinforcement; and when conditioning has taken place, a shock to the tail will be consistently followed by a movement of the foreleg. Superficially the relation resembles a reflex, but the greatest confusion would arise from treating it as such and expecting it to have the usual properties.
University of Minnesota