Classical Texts in Psychology
(Return to index)
"GENERAL INTELLIGENCE," OBJECTIVELY DETERMINED AND MEASURED
C. SPEARMAN (1904)
First published in American Journal of Psychology 15, 201-293
1. Signs of Weakness in Experimental Psychology.
To-day, it is difficult to realize that only as recently as 1879 Wundt first obtained from the authorities of Leipsic University one little room for the then novel purpose of a "psychological laboratory."
In twenty-four years, not only has this modest beginning expanded into a suite of apartments admirably equipped with elaborate apparatus and thronged with students from the most distant quarters of the globe, but all over Germany and in almost every other civilized country have sprung up a host of similar institutions, each endeavoring to outbid the rest in perfection. The brief space of time has sufficed for Experimental Psychology to become a firmly established science, everywhere drawing to itself the most vigorous energies and keenest intellects.
But in spite of such a brilliant career, strangely enough this new branch of investigation still meets with resolute, wide spread, and even increasing opposition. Nor are its enemies at all confined to belated conservatives or crotchety reactionaries; they are rather to be found among the most youthful schools of thought; their strength may be in some measure estimated from the very elaborate apology which one of the best known experimental psychologists has lately found himself called upon to utter on behalf of his profession. [p. 203]
And, indeed, when we without bias consider the whole actual fruit so far gathered from this science -- which at the outset seemed to promise an almost unlimited harvest -- we can scarcely avoid a feeling of great disappointment. Take for an example Education. This is the line of practical inquiry that more than all others has absorbed the energy and talent of the younger workers and that appears to offer a peculiarly favorable field for such methods. Yet at this moment, notwithstanding all the laborious experiments and profuse literature on the subject, few competent and unprejudiced judges will venture to assert that much unequivocal information of capital importance has hitherto thus come to light. Nor have the results been more tangible in Psychiatry or in any other department of applied psychology.
Those, then, who have the highest opinion concerning the potentialities of this new science, will feel most bound to critically examine it for any points of structural incompleteness.
2. The Cause of this Weakness.
Most of those hostile to Experimental Psychology are in the habit of reproaching its methods with insignificance, and even with triviality. They regard it as an infatuation to pass life in measuring the exact average time required to press a button or in ascertaining the precise distance apart where two simultaneous pinpricks cannot any more be distinguished from one another; they protest that such means can never shed any real light upon the human soul, unlock the eternal antinomy of Free Will, or reveal the inward nature of Time and Space.
Such blame, however, would appear ill founded -- at any rate, in principle. This same apparent triviality lies at the base of every successful science. The three laws of Newton on first inspection are by no means remarkably significant; yet by a large number of instructed persons they have been found implicitly to contain the supreme key to every event on the earth below and in the heavens above. When starting any new branch of mathematics, again, most people have had occasion to be astonished at the curious suddenness with which the seemingly shallow beginnings have shelved down into drowning deep water. The general fact is that our limited intellects can only hope to deal with the infinite complexity of Nature after analyzing it down into its bare unaesthetic elements.
On the other hand, it must frankly be admitted that such a procedure is, after all, only indirect; that it does not immediately handle the things which really interest us, but other things which are believed to accurately enough betoken the former; that the results arrived at concerning the simpler terms are therefore always worthless, except in proportion as their [p. 204] elements have been proved beyond dispute to be identical with those of the more complex terms. Now, even in physical sciences this proof is not such an infallible operation that we can afford to neglect the possibility of lurking errors which may vitiate all our conclusions; and in psychical research such dangers are enormously magnified. When we pass an electric current through water until it vaporizes away into bubbles of hydrogen and oxygen, we can with reasonable precautions be tolerably certain that we have still got in our jars almost the whole of the same material substance, only reduced to simpler forms. But when we assert that the decision of Regulus to vote against making peace with Carthage was no more than a conglomeration of visual, auditory, and tactual sensations in various stages of intensity and association, then there is an undeniable risk that some precious psychical elements may have slipped through our fingers.
On this vital matter, it must reluctantly be confessed that most of Wundt's disciples have failed to carry forward the work in all the positive spirit of their master. For while the simpler psychoses of the Laboratory have been investigated with great zeal and success, their identification with the more complex psychoses of Life has still continued to be almost exclusively ascertained by the older method of introspection. This pouring of new wine into old bottles has not been to the benefit of either, but rather has created a yawning gulf between the Science and the Reality. The results of all good experimental work will live, but as yet most of them are like hieroglyphics awaiting their deciphering Rosetta stone.
3. The "Identities" of Science.
Here, we naturally arrive at the important question as to what actually constitutes "identity" for scientific purposes.
As regards the material atoms of the physical sciences, this relation is of two orders. There is the Identity in the looser use of the word, which really means no more than uniformity of potential function, or the fact of having like reactions under like conditions; this alone constitutes the proper topic of the science. And then there is the true Identity involved in the metaphysical idea of persistence of substance, which in science is only a convenient working hypothesis to aid in establishing uniformities of the former order.
For psychology, also, the identification is of two orders. First, there is once more Uniformity of Function, and again this appears to be the proper topic of the positive science. But the second order is quite disparate from anything in physics, being that of inward resemblance as ascertained by introspection; such a "Conceptual Uniformity," though in metaphysics per- [p. 205] haps of primary importance, in psychology is but an indispensable substructure -- and one of lamentable fallibility. It cannot even be forthwith assumed necessarily to imply complete Functional Uniformity; and it is peculiarly insusceptible of scientific precision, propositions scarcely ever admitting of either decisive confirmation or refutation.
Now, it is one of the great merits of experimental psychology to have largely introduced the direct investigation of these Functional Uniformities, which have the infinite advantage of being eventually susceptible of conclusive proof, and on being securely established are in their turn capable of throwing back a valuable corrective light upon the Conceptual ones also. So far, however, this matter of research seems to have been almost entirely confined to such correspondences as are approximately complete (these, indeed, being the only ones attainable without a new development of methodics). But the vast majority of the functional relations are not thus complete; they are more or less thwarted by other factors; they outwardly present themselves only in the form of stronger or weaker tendencies. And precisely of this incomplete nature are most of the Functional Uniformities which connect the psychics of the Laboratory with those of real Life.
4. Scope of the Present Experiments.
The present article, therefore, advocates a "Correlational Psychology," for the purpose of positively determining all psychical tendencies, and in particular those which connect together the so-called "mental tests" with psychical activities of greater generality and interest. These will usually belong to that important class of tendencies produced by community of organism, whereby sufficiently similar acts are almost always performed by any one person in much the same manner; if, for example, he once proves good at discriminating two musical tones, he may be expected to manifest this talent on any subsequent occasion, and even in another portion of the scale.
For finding out the classes and limits of these individual functions, modern psychology seems to have mainly contented itself with borrowing statements from the discredited "faculties" of the older school, and then correcting and expanding such data by inward illumination. The following work is an attempt at the more fatiguing procedure of eliciting verifiable facts; the good intention and the difficulty of such an enterprise may, perhaps, be allowed to palliate the shortcomings in its execution. Our particular topic will be that cardinal function which we can provisionally term "General Intelligence;" first, there will be an inquiry into its exact relation to the Sensory Discrimination of which we hear so much in [p. 206] laboratory work; and then -- by the aid of information thus coming to light -- it is hoped to determine this Intelligence in a definite objective manner, and to discover means of precisely measuring it. Should this ambitious programme be achieved even in small degree, Experimental Psychology would thereby appear to be supplied with the missing link in its theoretical justification, and at the same time to have produced a practical fruit of almost illimitable promise.
HISTORICAL AND CRITICAL.
1. History of Previous Researches.
Though, as above stated, mental correlation has in general met with great neglect, yet a certain number of psychologists, including several of the best known, have from time to time turned their attention that way also. It therefore seems advisable, before entering into the present work, first briefly to survey the results of these previous researches; they will be found on the whole to indicate some very remarkable conclusions.
Only those correspondences will be taken into account in which both terms compared are of a physical nature; many investigators, after determining the chief measurements of their subject's mind, proceed to make their record still more complete by also noting his most prominent bodily characteristics and external relations, such as his height and weight, the shape of his head, the color of his eyes and hair, the birthplace of his mother, etc. Such considerations, however interesting, do not quite fall within the scope of the present inquiry.
Galton. The first hint appears to have come from that suggestive writer, Francis Galton. As early as 1883, the latter stated that he had found men of marked ability to possess on the whole an unusually fine discrimination of minute differences in weight. The pregnancy of this idea is unmistakable. But Galton appears to have been diverted from the point by other interests, and to have contended himself with the above general impression, without clinching the matter in systematic investigation. In 1890, however, on Cattell publishing an article about "Mental Tests and Measurements," a remark was appended by Galton suggesting the desirability of comparing such laboratory values with "an independent estimate of the man's powers. . . . The sort I would suggest is some- [p. 207] thing of this kind, -- 'mobile, eager, energetic; well shaped; successful at games requiring good eye and hand; sensitive; good at music and drawing.'" It will be seen that subsequent investigators have unanimously preferred a much less lively programme.
Oehrn. The earliest actual experiments in mental correlation seem to have been those of Oehrn, in 1889, which at the same time furnished the starting point for that special branch termed by him, and now popular as, "Individual Psychology," The latter must, however, be fundamentally distinguished from the "Correlational Psychology" here advocated. For the former deliberately bases itself upon introspectively determined faculties and upon mental tests; whereas the latter begins by empirically ascertaining both the faculties and the precise value of the tests. The former endeavors to discover those small deviations from general law which constitute "individuality;" while the latter, on the contrary, proposes methodically to eliminate individualities as an obstacle to further progress, being itself, no less than General Psychology, in search of laws and uniformities.
Oehrn tested ten subjects in Perception ("Wahrnehmungsvorgang"), Memory, Association, and Motor Functions. In accordance with his standpoint of a priori assumed faculties, he does not correlate the results with any independent estimate of his subjects' intellectual powers, but only the tests with one another. He eventually comes to the conclusion that Perception, Memory, and the Motor Functions are "proportional to one another," but that Association is rather inverse to all the others!
Boas. The comparison desired by Galton between these laboratory tests and on the other hand the psychics of practical life was, as far as I am aware, first undertaken seriously by Boas. In 1891, the latter examined no less than 1,500 school children as to their Sight, Hearing, and Memory; and then -- following the example of the semi-anthropometrical correlations of Porter and others -- he proceeded to compare their performances in the above respects with their "Intellectual Acuteness" (as estimated by their teachers). On the first two heads, un- [p. 208] fortunately, the results have never been published. But as regards Memory, wherein his method of procedure in the main resembled that of Oehrn, the facts elicited were elaborated by Bolton, who comes to the following conclusions:
"The Memory Span increases with Age rather than with the growth of Intelligence."
"The Memory Span measures the power of concentrated and prolonged Attention."
"Intellectual Acuteness, while more often connected with concentrated Attention, does not require it, and it cannot be said that those pupils who are bright intellectually are more distinguished on account of their good memories."
It will be observed that these results are in sharp antagonism with the view of many modern psychologists, notably Wundt, who would make Attention the very essence of intellectual power.
Gilbert. In 1893, at New Haven, another series of experiments was carried out upon an almost equally extensive scale, and is still among the most important contributions to the subject. J. Gilbert applied several mental tests to about 1,200 children of both sexes, and then compared the results with their "general ability" (again as estimated by their respective schoolmasters).
On this occasion, the original assertion of Galton was to some extent practically corroborated. For Gilbert believes himself to find a real correspondence of Intelligence with Sensory Discrimination both of weights and of shades. He also, like Bolton, discovers a slight correspondence with Memory; in Gilbert's experiments the child, instead of learning by heart a row of figures, had to give his judgment as to when a musical tone had lasted just as long as a previously sounded standard one.
But the correspondence deemed most positive and conspicuous was that between Intelligence and "Reaction-time." This is particularly suggestive, on reflecting how especially this Reaction-time depends upon concentration of the Attention. The indication would therefore accord rather with Wundt's view than with that of Boas. Curiously enough, when the Reaction-time is made more obviously intellectual by further complications (Discrimination and Choice), then the above correspondence becomes reduced in amount.
Scripture. In the same little volume, appears an account of an interesting experiment by Scripture, as to the correspondence [p. 209] between shortness of Reaction-time and swiftness in lunging with foils. Unfortunately, his subjects are only seven in number. He feels himself, however, "fully justified" in coming to the conclusion that "the average fencer is not quicker in simple reaction than a trained scientist, and neither class shows an excessive rapidity."
The first part of the above sentence would well harmonize with the intellectuality found also by Gilbert to be connected with speed in pressing a button; but the latter part is difficult to reconcile therewith, at any rate without painfully lowering the credit of "trained scientists."
Dresslar. Also in 1893, a quite new kind of correlational factor is investigated, that of natural illusions. As is well known, if we pick up two things of the same weight but of different size, we are almost irresistibly inclined to estimate the larger one as being the lighter of the two; strangely enough, the illusion still persists even after we know that the weights are really equal. Some 173 boys and girls were tested in this respect by Dresslar, and at the same time were classed by their teachers into "bright," "good," and "dull." Dresslar found that the phenomenon was perfectly constant throughout the ages tested, 7 to 14 years, and that instead of the fallacy chiefly affecting the stupid children, as might have been expected, it on the contrary showed itself the more powerful in proportion as the child was "brighter;" hence he concluded: "The more intelligent the children, other things being equal, the stronger are the associations between the ideas of size and weight."
Griffing. In the following year, H. Griffing examined the other chief aspects of Attention. The two former workers had dealt with its concentration or intensity; Griffing now inquires about its amplitude or extensity. He does this by the well-known method of "tachistoscopy:" a number of syllables are exposed to view for a very brief moment, and the subject has to try how many he can read in this practically simultaneous manner.
The result of two independent sets of such experiments is in both cases that "the brighter students tend to excel."
Bourdon. About now we come upon a new and significant phenomenon in the course of these researches. The latter, though originally prompted in England, had forthwith been transplanted to America, in which country alone up to the present date they had been cultivated so as to bear fruit. But at [p. 210] length the Old World also woke to the necessity of answering the questions. France led the van, soon bestowing upon the problem an original and characteristic impress.
In August, 1895, appeared an article by Bourdon entitled "Recognition, Discrimination, and Association." For this investigation, about a dozen subjects were tested in
(1) the power of recognizing words previously shown to them;
(2) the power of quickly and accurately erasing from a printed page certain given letters of the alphabet (this was one of Oehrn's tests for "den Wahrnehmungsvorgang");
(3) the number of ideas arising in the mind within one minute on a given suggestion.
The conclusion arrived at is that all three faculties present some correspondence with one another, but that this is much most marked between Recognition and Association.
Thus upon this occasion, as in the work of Oehrn, the mental tests were only compared with one another, and not with any independently obtained estimate of Ability. Like the German, Bourdon appears to consider that these three faculties, Recognition, Discrimination, and Association, are so satisfactorily represented by the tests, that any otherwise gained values of Intelligence would merely be vaguer and less trustworthy versions of the same data.
Binet and Henri. Again in France, towards the end of the year, there appeared an important article of similar tendency, bearing the well-known signatures of Binet and Henri and setting forth the urgent need of "studying the relations that exist between different psychical processes." They propose the following ten tests: Memory, Mental Images, Imagination, Attention, Faculty of Comprehending, Suggestibility, Æsthetic Sentiment, Moral Sentiments, Muscular Force and Force of Will, Cleverness and "Coup d'oeil." By these means, they hope to measure off "a personality" in a fairly exhaustive manner within 1 to 1 1/2 hours.
In the tests themselves, there is a new feature to be noticed. Hitherto, these had been of the most elementary and unequivocal nature possible, as befits the rigor of scientific work. But [p. 211] this very simplicity had much increased the difficulty of making the test truly representative of any more complex psychosis. Binet and Henri appear now to seek tests of a more intermediate character, sacrificing much of the elementariness, but gaining greatly in approximation to the events of ordinary life. The result would seem likely to have more practical than theoretical value.
Next year Binet begins to put his interesting programme into execution. He examines about 80 children and 6 adults as to powers of describing a picture shown to them, and by this means discovers the existence of five fundamental types of character, the "describer," "the observer," "the erudite," "the emotional," and "the idealist." "It is perhaps the first result," Binet remarks, "that has hitherto been produced by the experimental study of the higher intellectual faculties."
Binet then compares these new types with "the notes and comments which the professors wrote about their pupils and which the Director of the school has carefully checked." But as to the result of this comparison, unfortunately, only the following brief remark is made public: "Of five pupils whom I had put into the 'emotional' group, four had a cold temperament, a dry nature, and a little sensitiveness; the fourth alone seemed sensitive."
Sharp and Titchener. The above work of Binet and Henri found a speedy re-echo from the other side of the Atlantic. Some experiments with the avowed object of examining this new class of test are now recorded as taking place at Cornell University under the direction of Dr. Sharp and with the aid of Prof. Titchener. These were expressly intended to depart from the older "German procedure" of dealing solely with the "elementary mental processes," and instead were to subject to trial the "French procedure" of directly handling the "complex" ones.
The following classification was adopted: Memory, Mental Images, Imagination, Attention, Observation, Discrimination, and Taste. The subjects consisted of three male and four female advanced students. No independent information was obtained concerning the subjects' respective mental powers, it being only attempted to ascertain whether the tests were consistent among themselves.
The results are not very encouraging:
"The lack of correspondences in the individual differences observed in the various tests was quite as noticeable as their presence."
[p. 212] "But little result for morphological psychology can be obtained from studies of the nature of the above investigation."
"In the present investigation the positive results have been wholly incommensurate with the labor required for the devising of tests and evaluation of results."
In conclusion, Sharp suggests the advisability of judiciously combining the characteristics of both the French and German procedures with one another.
Wagner. Almost simultaneously, the idea of collating mental tests with more practical methods of appraisement begins to take root in Germany also. In 1896, a series of experiments for the purpose of inquiring into the question of fatigue of school children was carried out at Darmstadt under the direction of Dr. Wagner. The children were from the new Gymnasium there and seem to have amounted in all to 44 (though the information on this point is not very definite.) The test investigated was the old one of Weber which had recently again been brought to the notice of pedagogical circles by Griesbach. As is well known, it consists in ascertaining how near together two points can still be distinguished from one another by touch. On this occasion, care was taken to obtain an estimate of every child's Natural Talent (Begabung), Industry, Attentiveness, Nervousness, and sometimes Temperament.
Unfortunately for our present purpose, the intention of the experiment was not so much to correlate these psychical qualities with the children's absolute sensitivity, as with the reduction in such sensitivity produced by the fatigue of lessons. This reduction is stated to correspond closely with the amount of Attention paid by the child, but to be almost independent of his Natural Talent. Once more, therefore, Attention and Ability are contrasted instead of being identified.
As far as concerns the children's unfatigued condition, our real present topic, we only learn that the nervous and indisposed have a less fine tactual sensitivity than the others.
Ebbinghaus. About the same time, another and much more extensive investigation was officially instituted in Silesia for the same purpose. Two entire upper schools, a boys' Gymnasium and a girls' High School, were before and after work subjected to three tests: the two old ones of Oehrn for Memory and Association (memorizing and adding numbers respectively), and the new "Combination Method" of Ebbinghaus. The latter observer in discussing the results devotes no less than one entire section out of four to considering the rela- [p. 213] tions shown between these tests and the children's general intellect.
He comes to the conclusion that the school order shows an appreciable correspondence with all three tests, but least so with Memory and most with his own new Combination Method. He particularly points out that in the last mentioned this correspondence applies, not only to difference of class, but also to position within each class; whereas in the case of Memory, he thinks that if anything the least intelligent succeed the best!
The Combination Method would appear to resemble the new type of test recommended by Binet and Henri, to the extent of presenting a rather intermediate character between the elementariness of normal laboratory work and the complexity of practical activities.
Wiersma. To depart for a moment from the chronological to the logical order of events, this favorable verdict of Ebbinghaus concerning his own new method was in 1902 strongly corroborated by some experiments of Wiersma. This time, three schools were brought into service. Two of them were special training establishments for male and female teachers respectively, from fourteen to nineteen years old. The third was a "Nachbildungs" School, namely, one for those of both sexes who had already gone through the six classes of the elementary school; they consequently aged from eleven to fourteen. The total number came to about three hundred.
Following closely in the steps of Ebbinghaus himself, Wiersma finds his average results to improve regularly with the higher classes and with the higher sections of each class. He takes great pains to analyze the factors upon which such school position depends, and arrives at distinguishing Age, Educational Development (Entwickelung), and Natural Talent (Begabung). In many complicated tables and graphs, he marshals evidence that the observed correspondence is most of all due to the last named factor.
Binet and Vaschide. In 1897, the question is again attacked by Binet, now in partnership with Vaschide. But there is a remarkable return, as far as psychics are concerned, to the old less aspiring forms of tests. For he once more examines children in Reaction-time, Reaction-time with Choice, and Memory of Numbers. In addition thereto, he devises the ingenious test of motor ability called Dots ("petits points"); this consists in seeing how often the subject can tap with a pen on a piece of paper in 5 seconds. The intellectual order of the [p. 214] children was again obtained from their respective ranks in class. The subjects numbered 45 and averaged about 12 years of age. The results are exactly opposite to those of Gilbert, for Binet sums up as follows:
"The Intellectual Order harmonizes badly with Reaction-times and harmonizes well with the Memory of Numbers." But better than either appears the correspondence with his own "Dots."
This work was quickly followed by similar tests upon older subjects. For such purpose, Binet and Vaschide turned to the Normal School of Teachers at Versailles and there examined 43 youths ranging from 16 to 20 years of age. This time, the scanty positive results of the former experiments are still further reduced; for even the correlation with Memory is somewhat less in evidence. The relation with the "Dots" again presents an unbroken regularity, but this time it seems to have become inverse, the stupidest tapping with the greatest speed!
Seashore. We next come upon an interesting series of carefully conducted experiments, to which we shall frequently have occasion to refer. It took place from 1897 to 1899 at the University of Iowa, under the direction of Dr. Seashore, the subjects being nearly 200 children varying from 6 to 15 years inclusive.
Here, the negatory note that we have first heard from Binet is reiterated, and now in much fuller tones. . As regards General Intelligence (again as estimated by the teachers) and Memory of Time, between which Gilbert had found a very marked correspondence, Seashore on the contrary disposes of the question in the following brief words: "There appears to be no functional relation between the two processes."
So, too, between Intelligence and Discrimination of Pitch. For while Gilbert believed himself to have discovered some such correlation, Seashore again says curtly: "There is no functional relation; the distribution of the results practically coincides with the most probable distribution according to chance."
He further compares General Intelligence with the faculty of discriminating Loudness and with various Illusions of Form, Color and Weight. In each case, he finds himself forced to the same conclusion, that there is no indication of the bright children doing differently from the dull ones.
Pearce. Again temporarily deserting the chronological order, some other illusional experiments were carried out in 1903 by H. Pearce. These dealt with the subjective localization of [p. 215] touch sensations and indicated that judgment tends to be warped by other immediately preceding touch sensations in the same neighborhood of the skin. Pearce tested 32 children in this way and came to the conclusion that the warp is directly proportional to the child's intelligence. While thus corroborating Dresslar rather than Seashore, he differs even from the former in that he declares the fallacy to diminish continually with increase of age.
Bagley. The negations of Binet and Seashore were soon carried to a still further extreme. W. Bagley, experimenting at Madison upon 160 children, corroborates Binet to the effect that Reaction-time shows no correspondence with School Intelligence, and also supports the latter's second and inverse rather than first and direct result with "Dots;" for Bagley not only denies all correspondence between any motor abilities and mental ones, but believes his work to demonstrate positively that there is even a marked antagonism between the two, so that excellence in either direction is apt to be accompanied by deficiency in the other.
Carman. In 1899, at Saginaw, there was another investigation which, if we are to go by the number of children tested, must be judged the most important that has taken place up to this day. A. Carman examined 1,507 of them as to their sensibility to pain (and also their strength of hand), noting in each case, as had now become usual, whether the teacher pronounced them to be "bright," "average," or "dull." Not much detail is given, but the following general conclusions are arrived at:
"Boys reported by their teachers as bright were more sensitive than those reported as dull."
"Girls reported as bright were more sensitive and stronger than those reported as dull."
"Those reported as being especially dull in mathematics were more sensitive on the right temple than on the left."
It is further discovered that
"Girls with light hair and blue or gray eyes are less sensitive to pain on left temple," but "on right temple they are more sensitive than the dark."
This information is very curious.
Kirkpatrick. In 1900, a slight rally against the emphatic denials of Binet, Seashore, and Bagley is attempted by E. Kirkpatrick. About 500 children were tested in three "sim- [p. 216] ple motor activities," including Binet's Dots, Counting Aloud, and Sorting Cards; their respective performances were then compared with their degrees of intelligence as estimated in the usual way by their teachers.
The result is in every case a decided correspondence.
Thorndike and Woodworth. Hitherto, we have only seen attempts to ascertain what I may, perhaps, be allowed to call "statical correlation." But in 1901, Messrs. Thorndike and Woodworth make a vigorous onslaught upon the still more important and difficult "dynamical correlation." It is useful enough to know whether any child that "taps," etc. with unusual slowness may thereupon straightway be considered as "dull;" but it would be even more to the point to learn that daily practice with the tapping machine could make him any brighter.
Various previous researches had been distinctly encouraging in this matter. Stumpf declares: "The power of mental concentration upon certain points, in whatever region acquired, will show itself effectual in all others also." Gilbert and Fracker had found that practice in one form of discrimination or reaction-time brought with it improvement in the other forms. Scripture writes, intending apparently to include intellective activities: "Development of will power in connection with any activity is accompanied by a development of will power as a whole." And again, Davis comes to the conclusion that "practice in any special act" develops ability "for all other acts."
The experiments of Thorndike and Woodworth, however, give once more a flat negation. The indications are rather that the effect of training in any one mental achievement is of little or no use for other intellectual performances, even very closely akin. The persons tested were carefully exercised until they had acquired considerable proficiency in judging the relative sizes of some pieces of paper of a particular shape. But this so obtained talent seemed completely to depart as soon as new tests were made with papers of a different shape, or even of a somewhat different size. Similar experiments in other sorts of feats led to the same result.
Binet. About the same time, we have another interesting and long contribution from Binet. His subjects numbered eleven and were specially selected as being the five cleverest and the six most stupid out of a class of thirty-two. These [p. 217] two groups, the "intelligent" and the "unintelligent," were in all the tests opposed and compared.
Binet again confirms, and more positively than ever, that Reaction-time, either with or without the complication of "Choice," has no correspondence with Intelligence. He also contradicts the correlation found by Griffing with the extensive dimension of Attention, in the form of simultaneously reading a large number of letters exposed to view for a small fraction of a second; though, curiously enough, Binet finds a certain amount of correspondence when he quite similarly exposes some arabesque designs. And finally, he finds no correlation with a new test of his own devising, namely: a trial how small a change in the rate of the beats of a metronome can be accurately detected.
But, on the other hand, his formerly successful method of Memory of Numbers now once more showed a marked correspondence with Intelligence. So also, and to a similar amount, is a correlation shown by Erasure of Letters (like that of Oehrn and Bourdon) and by Arithmetical Addition (more complicated than that of Ebbinghaus). So, again, do his new tests of Accuracy in Counting Metronome Taps and in Counting Dots. And so does his other new test, that of Copying: the subject is to copy a certain amount of writing, and then note is taken as to how many syllables he writes from each glance at the original; the more intelligent, the more words per glance.
But the fullest correspondence of all was presented by the very old test of Tactile Discrimination, which we have already seen successfully assayed by Wagner in 1896.
Binet is further strongly of opinion that all these correlations with Intelligence are most marked upon first trial, and that they continually diminish in proportion as the intelligent and unintelligent are both alike given more and more practice in the tests.
Simon. Directly inspired, apparently, by the last research, the correspondence there discovered between intelligence and the copying tests was now corroborated under new conditions. M. Simon conceived the idea that any such correlation should be manifested in especially prominent relief at the Vancluse colony for backward children. He therefore tries seventeen of them, and finds in fact that, with one exception, all those classed medically as "Idiot" or "Imbécile" can copy fewer syllables at a time than do those merely termed "Dégénéré" or "Débile." He concludes enthusiastically as follows:
"Convenient, short, and exact, this copying of phrases at once constitutes a good method of diagnosing a child's intellectual development at the very moment of the experiment." [p. 218]
Kraepelin, Cron. Other observers, however, would appear to have been less fortunate in this region. Their application of experimental tests, even to such trenchant opposition as intellectual health and disease, has not led them to results that they have felt able to pronounce entirely unequivocal. The careful work of Kraepelin and Cron comes to the following close: "At the end of these considerations, we will not hide from ourselves that the obtained results have fallen far short of what one is accustomed to expect from collective experiments with the simplest 'mental tests.'"
Reis. When the above investigation was renewed on a more extensive scale by Reis, the latter finds indeed that these tests perfectly well admit of being executed upon the patients in the asylum; but the success would appear almost too great to fullfil the desired purpose, for often the patients prove the better performers of the two; a man, for instance, medically diagnosed as suffering from Dementia Paralytica with marked mental incapacity (deutliche geistige Schwäche) more than once comes out top of all fifteen subjects sane and insane alike.
Cattell, Farrand, Wissler. Now we come to about the latest and in many respects far the most important of all these attempts to correlate laboratory work with the psychics of real life. For amplitude of design, special experience of the directors, and lucid collation of the results, nothing up to the present has approached the researches which for about the last ten years have been progressing at Columbia University under the guidance of Cattell.
In 1896, the latter, together with Farrand, allowed a brief insight into the nature and extent of the proceeding being carried on. But not till 1901 was the total upshot of all this labor carefully put together and published by Wissler. By that time, 250 freshmen and some 35 seniors of the University, besides about 40 young women in Barnard College, had undergone the following elaborate series of tests (in addition to others not belonging to the present topic, such as anthropometrical, etc.):
[p. 219] The general intelligence of each student was settled by his average grading in all the different University courses; an amalgamation of these separate gradings resulted in forming eleven classes.
This class standing and all the above laboratory tests are now, for the first time in the history of the problem, correlated together with some mathematical precision. The final conclusions are about as blankly negative as could well be imagined. We are summarily informed that
"The laboratory mental tests show little inter-correlation."
"The markings of students in college classes correlate with themselves to a considerable degree, but not with the tests made in the laboratory."
And on inspecting the actual figures representing the faint correlations in question, it is mathematically evident that not one of them is more than would be expected to occur by mere accidental coincidence.
Aiken, Thorndike and Hubbell. Finally, in 1902, there appears an interesting contribution to the subject from Aiken, Thorndike, and Hubbell. Here "the functions in question were much more alike than were those examined by Wissler. We have examined the relationships between functions in an extremely favorable case." Nevertheless, on the whole the previous negative results are once more strongly corroborated; when some mental functions usually regarded as most purely typical of the associative process are compared together, their correlation turns out to be "none or slight."
2. Conclusions to be Drawn from these Previous Researches.
Thus far, it must be confessed, the outlook is anything but cheerful for our project contemplated at the end of the first part, or, indeed, for Experimental Psychology in general. There is scarcely one positive conclusion concerning the correlation between mental tests and independent practical estimates that has not been with equal force flatly contradicted; and amid this discordance, there is a continually waxing inclination -- especially noticeable among the most capable workers and exact results -- absolutely to deny any such correlation at all.
Here, then, is a strange enough answer to our question. When Laboratory and Life, the Token and the Betokened, are at last objectively and positively compared as regards one of the most important Functional Uniformities, they would seem to present no correspondence whatever with one another. Either we must conclude that there is no such thing as general intelligence, but only a number of mental activities perfectly [p. 220] independent of one another except for this common word to designate them, or else our scientific "tests" would appear to have been all so unhappily invented as to lie outside the widest limits of those very faculties of which they are supposed to form a concentrated essence.
It is true that Functional Uniformities might conceivably exist of other kinds; but for any such there is even less evidence; nor would they appear at all a priori probable, in view of the complete and surprising absence of that important one constituted by community or organism. Failing all Functional Uniformities, any connection between the experimental procedure and practical intelligence can then be no more than "Conceptual." But this is a position scarcely tenable for those whose chief claim is finally to have escaped from the endless tangle of purely introspective argument; moreover, such an admission would shear every experimental research of almost its whole worth and deprive the systems built thereon of their essential base.
Further, if thus the only correspondences hitherto positively tested, those between Intelligence and its variously supposed Quintessences, have totally failed to reveal any real existence, what shall we say of all the other by no means so apparently self-evident correspondences postulated throughout experimental psychology and forming its present backbone? To take one of the most extensive and painstaking of them, Dr. Schuyten, from 1893 to 1897, continuously amassed evidence to prove a close relation of the middle European temperature with the faculty of "voluntary attention" and even more generally with "the intensity of cerebral activity;" he seems to have repeated his observations on about five hundred different days, upon each occasion indefatigably proceeding round Antwerp from one school to another, visiting most of the time as many as eight. Now, his actual test of "voluntary attention" and "cerebral activity" consisted entirely in noting how many children kept their eyes on their lesson books for five consecutive minutes; but, as far as I am aware, there has not yet been any positive proof that this posture sufficiently coincides with all the other activities coming under this general term of "voluntary attention;" and in view of the universal breakdown of evidence for much more plausible correlations, Schuyten's a priori assumption can hardly be admitted as an adequate basis for his wide reaching theoretical and practical conclusions. To try another example, we have seen that a favorite test, successively adopted by Oehrn, Bourdon, and Binet, is that of erasing from a printed page certain given letters of the alpha- [p. 221] bet; but sceptics are still able to contend that because any person can dash a stroke through a's and i's with unusual speed, he need not therefore be summarily assumed to possess an abnormally large capacity for discrimination generally speaking, say, for telling a fresh from an over-night deer's trail, or distinguishing sound financial investments from unsound. Precisely similar criticism may be extended to almost the whole mass of laborious attempts to establish practical applications of Experimental Psychology, whether for pedagogical, medical, or other purposes.
Nor is the case much otherwise even with those stricter and more theoretical researchers who are rather inclined to regard as superficial any experiments involving large numbers of subjects. For however modest and precise may seem the conduct of their own actual investigation, it nearly always terminates with and justifies itself by a number of sweeping conclusions; and these latter will be found to essentially imply some assumed general function or process, such as "memory," "association," "attention," "fatigue," "practice," "will," etc., and at the same time this function is adequately represented by the laboratory test. To take for instance the speed of mental association, there is hardly a psychologist of note who has not at some time or other made wide reaching assertions on this point, often indeed finding herein one of the pillar stones of his philosophy; the more practically minded, as Kraepelin and his school, content themselves with demonstrating the details of its actual conduct, showing us how the rate will rise with practice or on imbibing tea, how it sinks in proportion to fatigue or mental disorder, how under the influence of alcohol it for a brief moment slightly ascends and then becomes permanently and profoundly depressed. But all these conclusions are derived from observation of one or two supposed typical forms of this "association," while the extensive experiments of Aiken, Thorndike, and Hubbell reveal that every form of association, however closely similar on introspection, must, nevertheless, always be considered separately on its own merits, and that "quickness of association as an ability determining the speed of all one's associations is a myth." The most curious part of the general failure to find any correspondence between the psychics of the Laboratory and those of Life is that experimental psychologists on the whole do not seem in any way disturbed by it. But sooner than impute to them -- the avowed champions of positive evidence -- such a logical crime as to prefer their own a priori convictions to this mass of testifying facts, it is perhaps pardonable to suspect that many of them do not realize the full significance of the situation! [p. 222]
3. Criticism of Prevalent Working Methods.
There is, however, an intermediate way between ignoring all this serious testimony and submissively accepting it; this consists in subjecting it to the most searching criticism of which one is capable. But such a procedure quickly leads to questions of greater generality; if we would deal with the matter at all adequately, we are compelled to enter into a general discussion of the methods universally prevalent for demonstrating association between two events or attributes. To this important topic a special work has been devoted. For the present, we must limit ourselves to the following brief exposition of the chief deficiencies appearing especially to characterize the long series of experiments just reviewed.
In the first place, only one out of them all (Wissler at Columbia) attains to the first fundamental requisite of correlation, namely, a precise quantitative expression. Many writers, indeed, have been at great trouble and have compiled elaborate numerical tables, even bewilderingly so; but nowhere do we find this mass of data focused to a single exact result. In consequence, not only has comparison always been impossible between one experiment and another, but the experimenters themselves have proved quite unable to correctly estimate even their own results; some have conceived their work to prove that correspondence was absent when it really existed to a very considerable amount; whereas others have held up as a large correlation what in reality is insignificantly small. Later on, we shall come upon examples of both kinds of bias. With this requisite is closely bound up another one no less fundamental, namely, that the ultimate result should not be presented in some form specially devised to demonstrate the compiler's theory, but rather should be a perfectly impartial representation of the whole of the relations elicited by the experiments.
Next, with the same exception as before, not one has calculated the "probable error;" hence, they have had no means whatever of judging how much of their results was merely due to accidental coincidence. This applies not only to the experiments executed with comparatively few subjects, but even to those upon the most extensive scales recorded. The danger of being misled by combinations due to pure chance does, indeed, depend greatly upon the number of cases observed, but in still larger degree upon the manner in which the data are calculated and presented.
Thirdly, in no case has there been any clear explicit definition of the problem to be resolved. A correspondence is ordi- [p. 223] narily expressed in such a general way as neither admits of practical ascertainment nor even possesses any great theoretical significance; for a scientific investigation to be either possible or desirable, we must needs restrict it by a large number of qualifications. Having done so, any influence included (or excluded) in contravention of our definition must be considered as an irrelevant and falsifying factor. Now, in many of the experiments that we have been discussing, even in those upon quite a small scale, the authors have tried to kill as many birds as possible with one stone and have sought after the greatest -- instead of the least -- diversity; they have purposely thrown together subjects of all sorts and ages, and thus have gone out of their way to invite fallacious elements into their work. But in any case, even with the best of intentions, these irrelevant factors could not possibly be adequately obviated, until some method had been discovered for exactly measuring them and their effect upon the correlation; this, to the best of my knowledge, has never been done. As will presently be seen, the disturbance is frequently sufficient to so entirely transform the apparent correlation, that the latter becomes little or no evidence as to the quantity or even direction of the real correspondence.
Lastly, no investigator seems to have taken into any consideration another very large source of fallacy and one that is inevitably present in every work, namely, the errors of observation. For having executed our experiment and calculated the correlation, we must then remember that the latter does not represent the mathematical relation between the two sets of real objects compared, but only between the two sets of measurements which we have derived from the former by more or less fallible processes. The result actually obtained in any laboratory test must necessarily have in every case been perturbed by various contingencies which have nothing to do with the subject's real general capacity; a simple proof is the fact that the repetition of an experiment will always produce a value somewhat different from before. The same is no less true as regards more practical appraisements, for the lad confidently pronounced by his teacher to be "dull" may eventually turn out to have quite the average share of brains. These unavoidable discrepancies have always been ignored, apparently on some tacit assumption that they will act impartially, half of them tending to enhance the apparent correlation and half to reduce it; in this way, it is supposed, the result must in the long run become more and more nearly true. Such is, however, not at all the case; these errors of observation do not tend to wholly compensate one another, but only partially so; every time, they leave a certain balance against the correlation, which [p. 224] is in no way affected by the number of cases assembled, but solely by the size of the mean error of observation. The amount of consequent falsification is in physical inquiry often unimportant, but in psychology it is usually large enough to completely vitiate the conclusion. This falsifying influence has in many of the above experiments, especially the more extensive ones, occurred in exaggerated form; for even those experimenters who are most careful in the ordinary routine of the laboratory have yet allowed themselves to be seduced by the special difficulties attending this sort of work; urged on the one hand by the craving after an imposing array of cases -- somewhat ad captandum vulgus -- and sternly restricted on the other side by various personal considerations (such as restiveness and fatiguability of the youthful subjects, fear of deranging school hours, etc.), they have too often fallen into almost incredibly hurried and inadequate methods of testing. Here, again, mere goodness of intention will not avail beyond a very limited extent, for the most painstaking work is far from entitling us to assume that the observational fallacy has been reduced to insignificant dimensions; we can have no satisfactory guarantee, until some method has been devised of precisely measuring the disturbance, and this does not seem to have ever been attempted.
The above criticism, of a perfectly general nature, must suffice for the bulk of the researches cited in this chapter; later on will be found a more detailed examination of those three particular ones which have dealt with precisely the same topic as the present article. If here methodological imperfections have admitted of formulation with unusual sharpness, the fact must by no means be taken as an especial condemnation of these and kindred experiments. Certain faults have, indeed, been especially prominent, as, for instance, the large errors of observation; but, on the whole, the majority of them would appear to contain at least as much good solid work as most of those more strictly confined to the laboratory and to a very small number of "trained" subjects; the former have only afforded a firmer foothold for criticism, because they have confined the question to a more simple unequivocal issue -- though not yet nearly enough so -- and because they have assailed their problem in a square positive manner. The final inconclusiveness of all their labor is not so much due to individual shortcomings of the investigators, or even of the whole branch of investigation, as to the general non-existence of any adequate system for proving and measuring associative tendencies.
Under all these circumstances, in spite of the many previous inconclusive and negatory verdicts, the question of correspondence between the Tests of the Laboratory and the Intelligence [p. 225] of Life cannot yet be regarded as definitely closed. The only thing so far demonstrated is that the old means of investigation are entirely inadequate. The present undertaking, therefore, has only ventured once more to approach the problem, because believing to have elaborated a new and reasonably complete methodological procedure, such as appears capable of at last bringing light upon this and innumerable other important regions hitherto inexplorable.
1. Obviation of the Four Faults Quoted.
In the last chapter, four grave faults have been charged against the entire antecedent literature on the present topic. The first thing insisted upon was a precise quantitative expression derived impartially from the entire available data; we must renounce adroit manipulation of tables and graphs, still further rounded into the desired shape by ingenious argument; the whole of our experimentally gained figures must without any selective treatment simply of themselves issue into one plain numerical value (varying conveniently from 1 for perfect correspondence down to 0 for perfect absence of correspondence); this will here be done in the method that has been successively elaborated by Bravais, Galton, and Pearson, and whose formula will be found on page 252; in addition, lists of individual amounts will be given in full as originally obtained, and therefore can be freely used either for checking the present results or for other inquiries. The second requisite was the probable error; concerning this it may at once be remarked that indispensable as is some evaluation, yet here less than anywhere else have we need, or even possibility, of rigorous exactness; approximate estimates will be appended to each correlation in tables. The third fault indicated was the one deriving from the errors of observation; little can be said in this place concerning the best means of reducing these to a minimum, for on such head our requirements are scarcely different from those already prevailing in all serious psychological research; only some matters of special interest will be briefly touched upon while describing the procedure of the present experiments in the next chapter. Much more important for us is the fact that [p. 226] the total effect of all such errors can be measured en masse and mathematically eliminated, and that until this has been done, no correlational value can be assumed as even approximately accurate. The formulae for this purpose are given and explained in the already mentioned article; sufficient for the present practical purposes will be briefly recapitulated at the end of the next chapter.
The remaining point is that of irrelevant factors; this also is more fully explained in the said articles; it involves a thorough preliminary investigation of all the terms concerned, without which the most skillful experimentation and lucid exposition will only be wasted labor; the produce of such preparatory exploration will form the remainder of the present chapter, while the processes of calculation will be given at the close of the next one.
2. Definition of the Correspondence Sought.
The first step towards eliminating irrelevancies is to clearly lay down how much we are to consider relevant, or in other words to properly define the problem at issue. As already stated, universal correspondences can never be the subject of investigation; in practice we are forced to introduce a large number of conventional restrictions, and for profitable work these must be explicit and unequivocal.
Let us first take that of Kinship. Putting other animals out of the question, we clearly cannot pretend adequately to sample even the whole existing human species. In order to obtain the simplest and least ambiguous results, it might seem desirable to reduce this source of variation to the least possible dimensions; an ideal experiment would then be wholly confined to a number of sets of brothers or sisters and to determining how far the more intelligent brother is also the better discriminator. But these narrow limits are most inconvenient on practical grounds, and even theoretically there appears no great objection against extending the kinship to any range that does not introduce inconvenient complications; we may eventually find it necessary to exclude differences of social stratum, of sex, etc.; or it may, on the contrary, be found allowable to admit all these and even some amount of internationality.
Next, we must bear in mind that the action of correctly distinguishing between two sounds or weights is a matter depending on many factors, and we must decide how many of these should be rejected as foreign to our purpose. In the present case, it seems best to limit the object of our research to that portion of the discriminating act which appears to constitute its [p. 227] specific core, excluding as far as possible such more outward influences as Zeal, Endurance, Manual Dexterity, Memory, etc. The last named requires especial attention, since it necessarily enters into all these mental tests. Its influence will greatly differ according to the method of procedure: the interval intervening between the two compared sensations may vary from a small fraction of a second to several minutes; it may leave the reagent's attention undisturbed, or it may make distracting calls upon it (such as causing him accurately to adjust an instrument). One of the investigations cited in the last chapter has gone so far in this direction, that a test called by the experimenters themselves "Perception of Pitch" is by the compiler preferably termed "Pitch Memory." Now, the correlations of Memory -- as far at any rate as my own researches have hitherto gone -- would indicate laws entirely different from those of Discrimination; if this be confirmed, any interference of the former may gravely perturb an investigation into the latter.
Thirdly, the correspondence selected for inquiry in the present case is that between natural innate faculties. By this definition, we explicitly declare that all such individual circumstances as after birth materially modify the investigated function are irrelevant and must be adequately eliminated; our results might, therefore, be wholly vitiated if we threw together people in disparate ages, those in full vigor and those tired or ill, those who have already practised the test in question and those to whom it is new, etc. To obviate this, we are obliged to search through the records of previous work, so as to ascertain all the influences that have been found seriously to affect any of the variants now in question. The chief results of such a preparatory investigation will now be briefly detailed, and for the sake of conciseness and unequivocality we will at once as far as possible explain and correct these antecedent data by collating them with the information subsequently derived from the present experiments themselves. The most prominent turn out to be Practice, Age, and Sex, which will be discussed in this order.
3. Irrelevancies from Practice.
The importance of having defined our correlation will here at once be evident. For if we had wished to inquire whether [p. 228] sensory acuteness and general intellect are correlated dynamically, if (as stated by many persons on the strength apparently of a priori reasoning) we had assumed that the discrimination of minute differences of sensation "is to be cultivated as the foundation of all intelligence," then we should have had to admit the variations due to Practice as perfectly relevant and we should have looked for a continual expansion in people's general ability in proportion to the labor they had expended on distinguishing tones, shades, and weights from one another. In the present experiments, however, it has been preferred to commence by investigating the statical relation; it has not been asked whether Intelligence is produced by development of sensory acuteness, but whether the former original endowment is on the whole accompanied by a corresponding amount of the latter also. From this it follows that we are bound to carefully eliminate differences of previous exercise.
(a) Pitch. In this branch of Discrimination the effects of Practice are especially conspicuous, but nevertheless they are not very easy to trace out with the quantitative precision required for our purpose. Experiment has, indeed, unanimously demonstrated that the threshold of discrimination to be reached by trained and competent acousticians, even when using quite different apparatus, is in the immediate neighborhood of 1/3 v.d.; but there is no such general agreement concerning the average threshold of "unpractised" persons. Delezenne, for instance, finds that an interval equivalent in the centre octaves to about I v.d. "becomes sensible to the least practised ears, as I have assumed myself on several people." But Preyer, on the other hand (though quite agreeing with Delezenne as regards the powers of practised acousticians), tried a few unpractised persons and found that they could not all decide with certainty unless the two tones differed by as much as 8 v.d. And [p. 229] when we come to the more extensive experiments of Cattell and Farrand upon male and female University students, we find that the average error for the F below the middle C was over a whole tone, which (after making all possible allowances for diversity in procedure; both experiments were conducted on monochords) must be taken as at least twenty times worse than the results of Delezenne.
This and similar apparent discrepancies, however, seem to proceed from a too vague and also too narrow conception of "Practice." Many writers freely make assertions concerning "highly trained reagents" (geübte Versuchspersonen) without stating -- often, it would appear, without clearly realizing -- whether the latter are merely trained in music generally, or have had special practice in distinguishing minute differences of Pitch. Other authors, on the contrary, term all their subjects "unpractised," even though accomplished musicians, if they happen to be new to this particular experiment; in such sense must be taken Delezenne's designation "les oreilles les moins exercées," for his reagents seem from the context to have been distinctly above the average as regards general musicality. Now, though the hyperacute sensitivity characterizing the acousticians has not been found (by the present writer at least) in most musical performers, even professional, yet the latter are at any rate considerably superior to the average person. Thus we so far have three very distinct grades of training: acousticians, musicians, and the rest.
There is also another kind of practice that must always be duly considered, namely that gained during even brief experiments. The Columbia undergraduates, for instance, who had a monochord put into their hands and were told to forthwith adjust it so as to produce unison with a previously given note, were, as regards training, almost as far removed from subjects whose examination lasts a quarter of an hour, as these again are from the long exercised reagents of Stumpf, Left, etc.
Lastly, there is a factor which though analogous to Practice yet must not be too summarily identified therewith; this is the influence of General Culture. It will require discussion in some detail, as it does not seem to have hitherto gained much attention, since it is peculiarly liable to engender fallacies in comparative tests of children's discrimination, and may eventually prove of profound theoretical importance. A glance at Table V will present a very marked inferiority on the part of the villagers as compared with the better social classes, for the former have a threshold more than twice as great as the latter. Small as is the number of cases involved, the discrepancy is so [p. 230] large as practically to exclude mere accident. The next readiest explanation is that villagers have far less opportunities of hearing music. But nearer investigation does not very much confirm this view; for the village in question rejoiced in the possession of an unusually fine set of church bells, and whoever has listened to a band of sturdy yeomen ringing for eight consecutive hours through all the possible "changes" will realize that those in the neighborhood have ample opportunity of hearing musical intervals. (The present writer had the misfortune to reside within a few yards of the said church.) Moreover, with one exception, all the villagers examined were in the habit of singing at home; while two of them were no mean performers on the violin and organ respectively. The violinist, it is true, exhibited the same discrimination as the average good musician; but the organist formed no exception to the general obtuseness, other than a far quicker capacity for improvement.
The same phenomenon is equally conspicuous among the young; for the village children, in spite of regular instruction and practice in singing, had a threshold twice as great as that found in the high class school. Also the relation between the children and the adults tells a similar tale; for in those social strata which continue after full growth to fully exercise their higher mental capacities, there is no appreciable alteration of ability to distinguish tones; the high class school has precisely the same average as the cultivated grown-up person. Among those, on the other hand, who at an early age are compelled to turn all their energy to muscular activities, we find a correspondingly rapid falling away of discrimination; bad as the child villagers were, those adult proved themselves nearly twice as inapt.
The only other experiments explicitly in this grade of society, with which I am acquainted, are the interesting ones of C. Myers; and to these, without much straining, a similar interpretation may be given. His European village children (many of them possibly very young; unfortunately the ages are not shown) present an even lower average than those in Table I; that his adults slightly improve, while mine, on the contrary, go down, may well be attributable to the known habits of self-cultivation characterizing the Scotch peasantry. Myers' Papuan children, with only a smearing of alien education, are accordingly worse than the Europeans of any age or class; and on returning to the more congenial occupations of [p. 231] pearl-diving and cannibalism, their discriminative powers appear to sink lower still.
There is thus considerable evidence that the ability to discriminate pitch is largely affected by deprivation of General Culture. At the same time, this influence is only conspicuous in tests of an unfamiliar character; the villagers were incompetent at the -- for them useless -- task of distinguishing tones, but they show no inferiority whatever in the more practical faculties of telling one shade and weight from another. It would therefore seem that General Culture is not an independent factor that may be superadded to special sensory training, but no more than a possible partial substitute, just as the carbohydrates can excellently supplement but never supersede food-stuff containing nitrogen.
The various effects of all these degrees of Practice may now be expressed roughly enough, but still quantitatively, in the following form:
[p. 232] (b) Sight. In visual discrimination there is evidently less scope for further improvement by special training, seeing that the ordinary necessities of daily life are already sufficient to call forth a large portion of most persons' potential faculty in this respect.
But even here a considerable residuum of dormant power usually remains over and may be awakened on sufficient exercise. Schirmer, for instance, found his threshold to continually diminish for about a week of continual practice, after which it seemed to him to have attained its maximum, while Müller-Lyer and Simon continued to make appreciable improvement for many months. Simon further noted the curious fact that practice in judging with both eyes brought with it but little betterment in judging with either eye alone; but when the left eye had been practised by itself to its maximum powers, then the right eye also, although itself unpractised, was nevertheless found to have advanced to its maximum. My own experience fully coincides with this view that the generality of people can considerably reduce their threshold by practice. Here, however, no appreciable influence is manifested by General Culture; the adults of the diverse social classes show just about the same average, and also as regards the children the difference is about what was to be expected from the difference in conditions of test.
This visual disparity between trained and untrained is difficult to estimate quantitatively. For previous experimenters upon unpractised reagents never seem to have attempted more than a rough empirical gradation incapable of being compared with other work. Our evaluation must therefore rest solely upon the present experiments, which, being conducted by daylight, are far from guaranteeing sufficient precision on this head; but still, as they turn out to agree perfectly with the mean of previous records of trained reagents, in all probability they are approximately accurate as concerns untrained ones also. In this case, the median threshold for discrimination of medium shades of gray, after 15 minutes of exercise, may be taken as involving the following difference of luminosity:
[p. 233] It would thus appear that in this faculty the influence of training is considerable, but still only about one-third of that in discrimination of pitch.
(c) Weight. Here, curiously enough, Practice seems to have remarkably little effect; the most highly trained experimenter has shown no advantage over the first comer.
Weber, for instance, declares 1/40 to be the difference just distinguishable by "quite the majority of human beings, without any long preliminary practice." yet he himself, in spite of all his long labors, appears to have been much below this standard. Similarly his successor, Fechner, after prodigiously extensive training, was in the end but a very moderate performer. A more direct proof of the inefficacy of prolonged exercise is given by the case of Biedermann and Loewit; for these observers at the very beginning of their experiments found their threshold for one-half pound to be equal to 1/21; yet at the close of their protracted research, this threshold had only improved to 1/23.
But however true this may be as regards training of very great duration, I have convinced myself that it does not hold good as regards the few minutes of fore-exercise; for my reagents have almost invariably discriminated better at the end of their fifteen minutes than when they first entered the room; sometimes the improvement is enormous.
We come, therefore, to the general conclusion that a few minutes' practice are necessary for most people to accustom themselves to the test, but that further training has little or no effect.
(d) Intelligence. As to the effect of training in this direction, there unfortunately appear to be no available data, except such conclusions as issue from the present experiments themselves. All that could be done before commencing work was to contrive that the reagents should, as far as possible, be on equal terms with regard to previous education.
4. Irrelevancies from Age.
This factor is obviously one that must largely influence both Sensory Discrimination and Intelligence, since nobody can suppose that any mental faculties remain at one constant level from first birth to the prime of manhood and on through the last senile decay. The matter acquires peculiar importance owing to the practical reason that we find it much easier to [p. 234] subject children to our experiments than to induce the same number of adults to let themselves be tested. Consequently, every research of this kind so far has dealt exclusively with reagents not yet beyond adolescence. We will again consider each of the four variants in turn.
As regards Pitch, the first notable inquiry into the development of discrimination appears to have been that of Gilbert, who comes to the conclusion that from the age of six to at least that of seventeen years there is a continual though irregular improvement. But the next investigator, Seashore, arrives at the very different opinion that no betterment takes place after ten, at which period he finds children to be fully equal to average adults; he even goes so far as to pronounce that "the organ of Corti reaches its maximum efficiency at the age of about ten, and that it then begins to deteriorate."
By the light of the previous section, however, it seems possible satisfactorily to reconcile these very divergent results. For we have just seen the immense effect of Practice, especially in this matter of pitch, and we may not unreasonably suppose such faculties to arrive at maturity earlier or later according as they have been more or less fostered by education, special or general. This hypothesis will be found frequently corroborated in the present work. The high class school perfectly harmonizes with Seashore's elder children in presenting no further increase of capacity after nine, at which age the average threshold already equals that of adults; and this agreement with Seashore extends not only to the relative powers, but even to their absolute performances, for both lots of children show a mean of about 4 v.d. But when we turn to the villagers, then the accordance is no less entirely with Gilbert: from 5 1/2 to 7 years, they average 40 v.d.; from 7 to 8, 23 v.d.; from 8 to 10, 15 v.d.; and from 10 to 14, 6-7 v.d.; and here again also the absolute values fairly coincide with Gilbert's, although the latter used such a different testing instrument as an adjustable pitch-pipe.
Even into very advanced life, the influence of Age appears still interflected with the previously mentioned factor of Culture. For those adults who exercise their higher intellectual functions abide to almost the end of their days at their best level; nothing is observable to correspond with the generally believed diminution of their audible range from 11 octaves to 10. To myself, it was no small surprise to find that the persons of 60, 70, and even 80 years could discriminate quite as acutely as those in their physical prime. But among the grownup villagers there seems to occur a marked bifurcation: those [p. 235] of them who on leaving school take to violin playing, hand-bell making, etc., continue to develop the discriminative function until it eventually becomes quite as fine as if it had enjoyed a better start; but those who have not had such special later advantages will not be saved by occasionally hearing music from lapsing into great dullment.
Turning next to Sight, Gilbert's very extensive experiments again present a continual growth perceptible even up to the seventeenth year; between nine and fourteen, the ages chiefly entering into the present work, the average threshold becomes reduced by about one fourth. My own results turned out quite similar as regards the village children, but in the higher class school development appeared to occur earlier. With respect to Old Age, this again appeared to bring with it no impairment whatever; the supposed "sharpness of young eyes" did not make itself manifest.
In respect of Weight, Gilbert's curves show a somewhat quicker arrival at maturity, no improvement being evident after about thirteen years old. The present experiments quite concorded in that the younger children were almost equal to the older ones and both were not far from adults. In Old Age, we once more find no appreciable loss of power.
Finally, we come to the effect of Age upon Intelligence. On this matter general opinion seems to be sharply divided; for some assert that, other things equal, a child must necessarily get cleverer as he grows older; while others hold Intelligence to be equally manifest at all ages after infancy and to be easily distinguishable from the gradually amassing stock of acquirements and proficiencies. As before, in default of any positive antecedent data the present experiments could only keep the point in view and endeavor themselves to throw light upon the question.
5. Irrelevancies from Sex.
This factor evidently requires consideration, seeing that in some of the previous experimentation males and females have been thrown together without apology, while at other times these have been regarded as obviously heterogeneous to one another.
Commencing with children, the only important evidence concerning Sight and Weight seems to be that of Gilbert, who comes to the conclusion that the sexes not only differ but that this disagreement is a quantity perpetually fluctuating from year to year according to the various phases of growth and especially of puberty. As regards Pitch, the chief testimony is [p. 236] that of Seashore, who declares boys and girls to present not the least appreciable difference.
Now, it is very improbable that the various senses should really be so unlike in such a respect, and accordingly it was not surprising to find that in the present experiments the village boys and girls showed quite analogous differences in Pitch also. Perhaps Seashore's results may be reconciled in the same way as before, that is, by considering that his children appear to have been better educated generally and therefore to have passed beyond the perturbations characteristic of the growing phases.
With respect to the influence of Sex upon adults, Seashore's women present a slightly better discrimination, but the difference is only such as may well be attributed to their more general study of music at the age in question. My own results coincided, in that both sexes appeared perfectly equal in all three senses tested; no support whatever was given to the popular assertion that men are much superior.
6. The Elimination of these Irrelevancies.
Thus we have come upon a considerable number of factors which evidently disturb the relations that we are about to investigate; and the more wee sharpen our criticism, the smaller aberrations shall we be able to bring to light, until the latter begin to appear infinitely numerous and hopelessly unavoidable; we successively encounter illegitimate deviations proceeding from hour of test, from temperature, from fatigue, from state of health, from fullness of stomach, from habitual and occasional consumption of alcohol and caffeine, and so on without end. Eventually, our experimentation will arrive at the condition of the hysterical person who has found out some or other medical objection to every description of food -- and so dies of self-starvation. To undertake any investigation at all, our attention cannot be confined to the detection of these impurities, but must take the further step of ascertaining which of them we can and cannot afford to neglect in practice. With this view, we have in every case insisted upon obtaining quantitative estimates of the differences actually produced by the various intruders, and we have found them to vary from the extravagant proportion of 1:60 down to inappreciability; from these general computations, it will usually not be hard approximately to pick out the amount applicable to any particular experiments.
So far, however, our information is still almost worthless. For the perturbance into which we are inquiring does not depend simply from the above average difference produced in the measurements by the intruder, but upon the ratio of this special [p. 237] difference to the total average difference found in an assemblage of individuals from both the disparate classes; this latter difference will clearly be the greater of the two, seeing that it derives from many other differentiating causes in addition to the special one; the ratio will very approximately present the correlation between the disturbant and the term disturbed. Even yet, though we have thus obtained all the requisite data, we cannot utilize them to measure the perturbance until we have further obtained a mathematical equation capable of performing this office; this will be given at the end of the next chapter; by its aid and that of the foregoing inquiry, we are finally enabled to decide which of the irrelevancies produce a falsification of appreciable size as compared with the probable error; all of less magnitude may be left out of consideration, as their elimination would not be of the slightest practical advantage.
To avoid more tediousness, we will here take only two sample cases to illustrate how the decision is actually reached as to whether an irrelevant factor be really of formidable nature. In tests of Pitch, say, it is desired to know whether the reagents may properly consist of non-musicians, mixed with musicians. Our Table (page 231) shows us that the former have an average threshold about twice the size of that of the latter; next, on taking a number of thresholds of both non-musicians and musicians thrown together, we find that the ratio between the averages of the worse and better halves respectively comes to something very near 3:1 (see any of the quoted works); hence the irrelevant correlation is about 2/3 or 0.62. Applying the appropriate formula, we find that under such circumstances a real correspondence of say 0.50 would be unduly deflected by about 0.14. As the experiments were being designed on such a scale as to give a probable error of only about 0.03, the perturbance must be pronounced very appreciably falsifying.
Let us consider the analogous case as regards the sense of Sight. Here also, it may safely be assumed that diversity of daily occupation among the reagents will have led some more than others to practise this particular faculty, and further that the extent of this discrepancy will on the whole certainly be less than that between musicians and non-musicians (seeing that very few people spend hours a day in discriminating small variations of light and shade); again, the general effect of practice in this sense has been found to be about one third of that in Pitch (p. 232), while the total average difference turns out to be about the same as for the latter; hence this time, the irrelevant correlation will be at any rate much less than one [p. 238] third of .062. On once more applying the suitable corrective formula, we find that any irrelevant correlation of 0.21 would falsify a real correspondence of, say, 0.50 by an amount of only 0.01. Thus we have ascertained that the deviation caused by difference of Practice is in this sense not likely to affect our result by more than a very small fraction of the probable error and may well be neglected.
In this way, a large number of irrelevant factors would be summarily dismissed from consideration; others, either from their amount or from the doubtfulness of the evidence concerning them, must be carefully kept in view; while some will be found of an unquestionably falsifying nature. The first employment of our knowledge is so to order our experiments from the beginning, both as regards selection of reagents and procedure in testing, that appreciable perturbance may be escaped. Next, and yet more important, we have learned the directions from which serious danger is to be apprehended, and forewarned is forearmed; we are enabled to so conduct our work, that the residum of deviations which could not be excluded actually may instead be afterwards eliminated with mathematical exactness from the final results.
7. Alternatives and Equivocalities.
Notwithstanding the exhaustive selective process which will have had to be executed in order to sufficiently eliminate all these irrelevant factors, there will generally still remain over a certain number of experimental methods that at first sight appear almost equally eligible.
As regards Intelligence, to begin with, there is a considerable variety of ways by which estimates can be gained, and most people would appear to enjoy very decided views as to which are the most satisfactory; but since the convincement of their opinion usually stands in inverse proportion to its evidential value, it is perhaps safer to begin with the methods nearest to hand and gradually discover for ourselves their relative advantages. In doing so, two points must always be carefully kept asunder: first, the reliability with which any system of measurement represents any particular form of intelligence; and secondly, the claims of the said form of intelligence to merit the name. The former point must be definitely ascertained in the course of the experiments, while the latter, though a very desirable piece of information, may or may not be eventually elucidated by the whole investigation.
Sensory Discrimination will usually offer a still greater field of choice. About the mode of interrogation, it here need only be mentioned that the present wants are not necessarily identical with those of other psychological work. For instance, it is [p. 239] well known that the sensory threshold has really two different values, according as the examination proceeds from greater to less differences or vice versa; now, for questions specially concerning this threshold as such, it is desirable to take the mean between the upper and lower values; for the present purpose, on the other hand, more regular results appear obtainable in a given time by determining only the lower threshold (the reason seems to be that the reagents become less confused when the changes are confined to the one direction).
Then comes the question as to what precise portion out of the whole sensory range should be chosen as the theatre of our experiments. For our pioneering work, at any rate, it would appear unquestionably best to commence with that part where the results are least exposed to irregularities, complications and unknown factors; this will always lead us to somewhere near the centre of the range.
Lastly, we must consider the influence of different apparatus. It is a much more prominent factor in some senses than in others. In Pitch, it appears to be minimal; highly trained observers, however variously tested, show little more diversity than can be satisfactorily accounted for by individual disposition. Delezeene, Seebeck, Weber, Preyer, Appunn, Stumpf, Luft, and Meyer, though using such widely differing instruments as monochords, reed-pipes, and tuning forks, obtained results of which the largest is little more than double the smallest, and even this small difference appears chiefly attributable to difference in mode of interrogation. Among less practised reagents, indeed, sensibility will sometimes be found to vary enormously with the kind of timbre, one person doing much better when there are no appreciable overtones and another vice versa; but still such discrepancy is generally traceable to diversity of previous habit and continually diminishes with further training. In Visual Discrimination, the results have been considerably less uniform. It is true that the majority of competent observers, Kraepelin, Volkmann, Aubert, Masson, Merkel, etc., though employing such various in- [p. 240] struments as Masson's disc by daylight, the same by artificial light, the shadow-method, the episcotister, and illuminated matt glass -- all agree that the most favorable medium shades of gray can just be distinguished from one another when the illumination differs by about 1/120. But on the other hand König with Brodhun, Helmhotz, and Bouguer arrive at a threshold twice as big, while Schirmer and Volkmann (in his later experiments) come to one nearly twice as small. That, however, the lion's share of these dissimilarities may be solely attributed to difference of instrument is shown by the largest variation occurring in one and the same person: Simon, when using the platinum lamp of König, has a still worse threshold than the latter; but when employing a disc like Schirmer, surpasses even this observer. The discordance assumes far greater proportions when we come to Discrimination of Weight: for widely apart are the values successively obtained by Weber, Fechner, Biedermann and Loewit, Hitzig, Jacoby, Merkel, Müller and Schumann, Martin and Muller, etc.; on the one extreme we have Biedermann and Loewit who could clearly perceive a difference in a 2 lb. weight if it were increased or diminished by so little as 1/2 oz.; on the other side, we find C. Jacoby whose five thoroughly trained and competent reagents required for the same purpose an increase or reduction twenty-four times greater, a discrepancy that for the main part may safely be attributed to difference of apparatus.
This multitude of gross fluctuations requires careful analysis. A certain number merely imply greater or less difficulty in the performance and consequently a simultaneous rise or decline for the generality of reagents. Others, again, derive from difference of individual disposition; the larger these are in any procedure, the more accurately will any functional correlation be able to be calculated; but they must not be too summarily confounded together, for possibly one procedure may exhibit greatest sensitivity to previous practice, while another may be the most delicate barometer of fatigue or ill health, and yet a [p. 241] third may reveal the largest correlation with intellect. The remaining fluctuations derive from causes so obscure, that for the present we must call them chance irregularities tending only to shroud and disguise our results.
From these considerations, it is clear that an alteration in method of experiment may result in transforming the correlation. But this further complication, far from being a real disadvantage, opens out a rich and virgin field to investigation, for it is the most promising means of penetrating into the correlation's essential nature. Eventually results will be gained of such greater generality, that they will stand out constant among these superficial changes, as the main features of a landscape abide immovable in the passing variety of cloud and sunlight.
DESCRIPTION OF THE PRESENT EXPERIMENTS.
1. The Choice of Laboratory Psychics.
We will now turn to a description of the experiments that form the basis of the present article. Their many deficiencies can scarcely be clearer to any one than to their author; so true is it that we first learn how properly to conduct any experiment -- when we have ended it.
As regards the nature of the selected Laboratory Psychics, the guiding principle has been the opposite to that of Binet and Ebbinghaus. The practical advantages proffered by their more complex mental operations have been unreservedly rejected in favor of the theoretical gain promised by utmost simplicity and unequivocality; there has been no search after condensed psychological extracts to be on occasion conveniently substituted for regular examinations; regardless of all useful application, that form of psychical activity has been chosen which introspectively appeared to me as the simplest and yet pre-eminently intellective. This is the act of distinguishing one sensation from another.
With respect to the particular senses preferred, the present experiments have been confined to Hearing, Sight, and Touch. The other five, Taste, Smell, Pain, Heat, and Cold, do not admit of such practicable or satisfactory examination; also, probably on this account, they have as yet been investigated very incompletely, and therefore do not form a good unequivocal foundation for research of more advanced order. Further, in the chosen three we have already the widest range of type: for Touch is the most direct of the senses, the physiological organ being apparently of such a simple structure as to convey [p. 242] the stimulus of the brain in a purely mechanical manner; Sight, on the other hand, offers the most perfect example of peripheral transformation, seeing that our visual presentations are but very remotely derived from the really external ether waves with which they are popularly confused; while Sound gives us a half-way stage between the above extremes.
In all three cases, the test has been of relative, not of so-called "absolute," Discrimination; the trial has not been as to how small an external stimulus can cause a sensation perceptible at all, but as to how great the difference in the external stimuli has to be for the reagent to notice any difference in the sensation; every one knows the uncertainties attending the former kind of investigation. Similar motives have in Sound made Pitch seem preferable to Intensity; and in Light, Luminosity to Color. It is perhaps less easy to justify Touch being represented by that form of it often termed the "muscular sense," which despite its notoriety and historical importance is now well known to be really most complex and obscure. Among other reasons for this choice, it was desired to see whether any correlation of Discrimination with Intelligence might not reasonably be attributed to adroitness in outward approach to the distinguishment rather than to superiority in the essential act itself; should this be the case, then the correspondence should be much more manifest in an active practical comparison of two weights than in the purely passive acceptance of two tones.
In short, the experiments were so chosen that any conclusions very uniformly attained in these three ways might be provisionally considered to hold good for Sensory Discrimination in general. ------
It may here be remarked that the present experiments were not primarily designed and executed for the sake of Discrimination itself, but as the indispensable basis for an investigation into Memory and Imagery. Only afterwards was it decided to publish the results on the former head separately, so as to keep the present article within manageable length and at the same time to secure for it greater unity and lucidity.
2. The Instruments.
The apparatus with which these inquiries were conducted were not of a very new or elaborate nature, my object at present being merely to make a practical and convenient application of the existing laboratory methods. Moreover, while naturally [p. 243] endeavoring to minimize all errors proceeding from mechanical imperfection, I have become quite convinced that faults arising from such a cause are in this case vanishingly small as compared with those proceeding from other sources.
(a) Sound. Like the experiments of Delezenne and Cattell mine were conducted by means of a monochord, but of a special kind made under my direction for this purpose; it is furnished with a Vernier scale, whereby differences of pitch can readily be produced down to 1/3 of a vibration; as the experiments were limited to the immediate neighborhood of the E above the middle C (reckoning the latter at 330 vibrations per second), the smallest securely obtainable difference amounted to 1/132 of a musical tone.
To the accuracy of such an instrument many theoretical objections may be raised, as unevenness of the wire, inequalities of tension, etc. However, experience has shown it to be reliable to the range above mentioned, which is ample for the present purpose. It was first tested in the usual way, that is, by simultaneously sounding two notes at a given small interval and then counting the interferences in the sound-waves; I had no difficulty in producing good reliable "beats." The instrument has further been well checked empirically, for more than one reagent has proved able correctly to discriminate down to the extreme limit. I have also been fortunate enough to obtain assistance in this manner from Dr. Krüger; this psychologist and acoustician very kindly allowed himself to be tested with my machine and found his threshold to be only such a small amount greater than with turing forks, as would naturally arise from his having had much more practice with the latter instrument.
The principal crime generally charged against monochords is that they give far stronger overtones than do turning forks and that consequently tests upon the former may not be comparable with those upon the latter. This point has already been discussed, and our conclusion was that all the different instruments hitherto tried had led to very similar average thresholds (in this respect being very unlike discrimination in Sight and Weight). In any case, there is no reason whatever to suppose that the results given by the overtoned instruments are at all less regular, and this is here the only matter of importance.
As a set-off against this one merit of comparative freedom from overtones, tuning forks are in so many other respects difficult to manage, that I should not personally prefer them except for use in an exceptionally well equipped laboratory. To bring them to sound, they have to be either struck with a hammer or stroked with a bow, and the consequent accessory [p. 244] noises and tones can hardly be reduced to such a minimum as is easily attainable in plucking the monochord; also, the tone from the fork only remains constant for a very short time, after which it rises very deceptively; further, it is very difficult to obtain two tones of sufficiently equal loudness, without using a special striking apparatus which again involves other disadvantages. Moreover, it is almost impossible to obtain enough range of tone in any satisfactory manner, especially as even the best tuning forks have appreciable overtones unless placed upon resonators to preferentially intensify the ground-tone; the device of a different fork for each grade of pitch is of very dubious merits; and when the common custom is followed of using one fork with the sliding regulator on it while the other has none, the overtones become so dissimilar as frequently to vitiate the whole experiment. The last objection can be partially met by having on each fork a separate regulator very light and low; but the first and second difficulties are very insufficiently overcome by the usual expedients, such as moving the fork to and from the reagent's ear; to fully obviate them, it is at least necessary to be able to conduct the sound from one room to another, so that the experimenter remains with his forks and opens the conductor just when the tone is pure and true, while the reagent sits alone and hears only what he is intended to hear; but such a procedure is quite inapplicable to the inexperienced subjects requisite for experiments like the present ones.
(b) Light. As we have seen, a great number of instruments have been at different times used for this purpose. Each of them possesses various advantages and disadvantages of its own. For my purpose, I constructed a graduated series of cards, each being slightly darker than the preceding one. The required delicacy was obtained by photographic means, on the principle that the darkness of a print will (for a certain short range of half-times) vary almost proportionately to the time of exposure to light. Various ways of printing were tried; silver gave a very fine gradation, but necessarily introduced the disturbing element of color; carbon was found difficult to develop with sufficient evenness; final preference was given to platinum on the smoothest possible paper, and a series of prints were obtained of such differences that the extreme ranges would measure the dullest normal sight, while two neighboring cards could not be distinguished by the acutest vision. The prints, each about 2" by 1", were then mounted on smooth cardboard, num- [p. 245] bered on the back, and given a border about one inch wide of even black, so that the conditions of contrast should in all cases be approximately equal.
The evenness of the series was then tested practically on many reagents, by trying whether all pairs of cards at the same number of grades apart presented precisely the same difficulty of distinguishment.
As to the illuminating source, it unfortunately was not found practicable to conduct the experiment by the uniformity of artificial lighting. However, small differences of the absolute intensity of the illumination are known not to appreciably affect accuracy of judgment, and such was found to be actually the case as regards the inevitable minor fluctuations of daylight. The most important point was that the two cards to be compared should be illuminated to exactly the same degree. This was sought by placing them side by side on precisely the same plane and opposite the centre of an evenly lighted window. Most reagents had a marked inclination to consider one particular side darker; in some cases this was the right, and in others the left (I could not find that this tendency corresponded with right and left handedness respectively, as might perhaps have been expected in view of the experiments of Van Biervliet).
The general luminosity of the cards was determined in the following manner. The lightest and darkest were replaced in a dark room, and two candles were arranged in front of them in such a way that each threw a circular ray of light about one inch in diameter upon one of the cards. It was then experimentally ascertained how much closer to the illuminating source the darker card had to be, for the latter to appear just as light as the other. The square of the greater distance divided by the square of the lesser thus gave a measure of the relative brightness of the two cards under equal conditions. It would be superfluous here to enter into the various details adopted to make this test satisfactory, seeing that the result is at best only roughly approximate; for, among other things, the luminosity was thus tested by artificial light, while the discrimination was tried by the different luminosity and tint of daylight. Within the actual limits, however, it does not appear that the consequent error will be very considerable; hence, the value found seems sufficiently accurate to be of some interest, especially as it is difficult anywhere to discover quantitative estimates of the normal unpractised person's power to distinguish differences of light.
(c) Weight. For this test a graduated series of weights were [p. 246] constructed on Galton's convenient cartridge pattern, all of precisely the same size, appearance, and balance; the lightest was 1,000 grains, and the others continually increased their weight in geometrical proportion. I found it necessary to considerably extend the range of differences; for some of my reagents could accurately distinguish Galton's finest degree, namely 1/100; while many perfectly normal persons showed an obtuseness that could not be measured even on the extremes recommended by him as being adapted for "morbid" cases.
3. Modes of Procedure.
(a) Experimental Series I. The reagents were the twenty-four oldest children of a village school in Berkshire, taken without any selection; they were tested in Light, Weight, and Sound. This school was particularly favorable for my purpose, as it was within 100 yards of my own house; all the children and their families resided in the immediate neighborhood, so that I could easily obtain any information concerning them; the rector and schoolmaster most obligingly gave their valuable co-operation, for which I hereby tender hearty thanks.
Each child was separately interviewed in my house, on a different day for each different sense. The test of Discrimination lasted fifteen minutes (and was then followed by test of Memory; the coincidence between these two is a good measure of the accuracy of both). No single trial was ever unusually prolonged, for fear of admitting the disturbing effects of fatigue, but further interviews were obtained when any doubtful points seemed to require clearing.
As regards the manner of interrogation, there was little hesitation in rejecting the method known as "minimal changes" in its purer form, which is dependent upon the reagent saying whether or not he can distinguish the difference between the two sensations offered to him; such a procedure appeared totally unfitted for such inexperienced persons. But still less applicable appeared the strict method of "right and wrong cases." The compromise was therefore adopted of searching for that threshold where the subject seemed able to give about 80% of his answers right. For similar reasons, preference had to be given to the "procedure with half knowledge" (halbwissentliches Verfahren) in spite of certain disadvantages; in this, the reagent is informed that the two stimuli are different, but is left to decide for himself as to the direction of the difference.
The beginning of the test is not devoted to recording the largest possible number of answers, but to quietly affording [p. 247] the reagent a maximum of fore-exercise and at the same time to gaining a general idea of his threshold. Then, there is a steady progression from greater to smaller intervals, until eventually a threshold is found where he can just give eight right answers out of ten. He is further tried at a still smaller interval, to see if he makes still more mistakes and thus to confirm the fact that he has really reached his limit. And finally he is once more tried at a slightly larger interval than the believed threshold, to corroborate the former observation that he here makes less than two errors out of ten. This constant progression in only one direction appears to very much reduce the mental distraction especially inherent in all procedure "without knowledge," which is very great in unpractised reagents if tested with the usual oscillations to and fro between greater and smaller intervals. Against it may be urged that it finds only the lower instead of the mean threshold, but this is of no importance for our present purpose.
It was further considered that more regularity would be obtained by only recording those answers which were given under the most favorable conditions. Before taking down each reply, a chance of reconsideration was given by repeating the test in such a manner as to reverse the constant error of time and space.
The two stimuli followed one another in the manner found to be most adapted to accurate judgment and to effectually eliminating the influence of Memory. In the case of pitch, the interval from the beginning of the first tone to that of the second was found best at about three-quarters of a second. In some earlier experiments, the first tone was just dying away when the other began; but later, I stopped the first one altogether just before giving the other. There is some knack required to do this satisfactorily, and the practical effect of the change was, to my own surprise, inappreciable with most reagents.
The final measurements, just as they were obtained, are set down in the column under "Sensory Threshold" in Table I.
(b) Experimental Series II. This was executed in the same village school, but upon the next thirty-six oldest children, the tests being only in sound. Unlike the previous twenty-four, these were examined collectively, the total interview lasting about 1 1/2 hours. The chief part of this time was devoted to instructing and practising them, and to finding out what was the lowest age fit for such a collective experiment. It became eventually evident that no usable results could be obtained at any rate from those below 5 1/2 years, and thereupon all those under this age were excluded, leaving thirty-six for the real tests.
The latter were carried out in the following manner. Every [p. 248] boy and girl was provided with a pencil and a piece of paper and had simply to write down 1 or 2, accordingly as he considered the first or the second tone to be the higher. The headmaster as well as the other teachers were present; a small prize was offered to stimulate attention, and energetic measures were found necessary to prevent cribbing. Ten test pairs of tones were given at about the following eight differences of pitch, 50, 33, 26, 20, 16, 10, 6, and 3 v. d.; thus in all, each child answered 80 times; in half the cases, the first tone was really the higher, and vice versa.
The marking was done by fully considering each child's whole paper, and then deciding as to what was the limit at which he might be expected to give about eight right answers out of ten. This method has seemed to me in such cases the most satisfactory, provided that the person marking has acquired the requisite experience by having previously examined in a thorough manner a great number of similar reagents, and provided the marking of the paper be done before receiving the intellective gradings (or else there is a great danger of "self-suggestion"). If this method be not adopted, recourse must be had to some purely formula reckoning of errors.
Despite all precautions to secure reliable results, I was unable to quite convince myself that such uncultured children could be treated adequately without elaborate individual attention.
(c) Experimental Series III. These experiments -- confined to Sight and Weight -- were made in a preparatory school of the highest class, which principally trained boys for Harrow. To the Principals, themselves old Harrovians, my hearty thanks are due for their kindness and cordial co-operation. As may well be imagined, the social standing and general culture of the reagents were the opposite extreme to that in the village school.
Unfortunately, these tests of Light and Weight had to be arranged at a few hours' notice and consequently were carried out under very unfavorable conditions. Out of the thirty-seven boys constituting the school, only twenty-four could be present, and of these again it was necessary to withdraw one from the results as being mentally too abnormal to be properly included with the others. In the next place, no masters were in the room after the first few minutes; in spite of the general excellent behavior, this relaxation of discipline must always be admitted to be a momentous circumstance. And finally, the visual and muscular senses -- never adapted to collective examination -- had on this occasion to be tested with the apparatus intended only for individual work: the weights and the cards were continually passed round and round by pairs, ticketed 1 and 2 respectively; each boy then wrote down which of the two he considered to be the heavier or darker. [p. 249] It was impossible to control whether they all handled the apparatus in precisely the same manner, or even to insure that they invariably gave their greatest possible attention to their tasks; moreover, some were inevitably more favored than others as regards intensity and evenness of illumination.
Owing to these facts, I brought away the impression that the experiments were chiefly interesting as enhancing the effect of Goodwill. For under better conditions, every reagent can be brought pretty well to try his best; but here, on the contrary, a wide range was observable in this respect, some being very zealous while others were visibly indifferent.
(d) Experimental Series IV. This series, which was in Sound only, took place in the same high class preparatory school, but now the circumstances were as propitious as above they were the reverse. Pitch is a sensory quality especially susceptible of collective test. The experiments were arranged and prepared with full deliberation. The entire school were available with only the exception of the above mentioned abnormal case, of one boy who had to leave the room before the end of the hour, and two who were that day absent; thus there were thirty-three complete results. Several masters attended, so that the strictest disciplines was maintained throughout; there appeared no inclination to crib; every boy seemed perfectly to understand what was required and to be intent upon doing as well as possible. I am therefore inclined to attach as much value to this series as to Series I; for the cultured intelligence and long habit of examination possessed by these boys should compensate the individual attention given to the villagers.
The conditions entirely resembled those described in Series II, except that only 48 final tests were made, 6 at each of the following differences: 20, 15, 11, 8, 6, 5, 3, and I v.d. The intellective grading was not received by me until long after the sensory grading had been completed, so that the latter is free from danger of self-suggestion.
(e) Experimental Series V. These were executed upon 26 male and female adults (thus bringing the total number of reagents throughout the present experiments to 123).
The method was individual and precisely the same as that already described for Series I. But in arranging the composition of the reagents, instead of trying to obtain as homogeneous a set as possible, it was here rather sought to include the greatest variety; for although little can be proved in such a manner, much can profitably be suggested.
4. The Estimation of "Intelligence."
As regards the delicate matter of estimating "Intelligence," [p. 250] the guiding principle has been not to make any a priori assumptions as to what kind of mental activity may be thus termed with greatest propriety. Provisionally, at any rate, the aim was empirically to examine all the various abilities having any prima facie claims to such title, ascertaining their relations to one another and to other functions.
Four such different kinds of Intelligence have been introduced into the present work. First, there is that revealed in the ordinary classification according to school order (based here upon examinations). This clearly represents Present Efficiency in such matters as Latin, Greek, Mathematics, etc. Examples of this kind will be found in experimental series III and IV.
The next sort of Intelligence derives from the same school order, but so modified as to exclude all influence of Age. Such a corrected order may be provisionally accepted as representing, not Proficiency, but Native Capacity. It has been arrived at by taking the difference between each boy's rank in school and his rank in age. For obvious reasons, it has been preferred to consider the absolute and not the relative differences; a boy, for instance, who was 20th by examination and 22nd by age would be placed just above one who was 15th by examination and 16th by age, the former being two places and the latter only one better than would have been expected with greatest probability.
The resulting order is clearly but a first approximation, to which we may apply any number of further corrections. For our present purpose, the following has appeared the most that can be practically required (and even this makes no appreciable change in the final values obtained). Evidently, the top boy is prevented from proving his full capabilities by want of competitors; let us suppose that he happens also to be the oldest; then, on our above method, he will seem no better than a boy of middle age and at the same time of middle school order; but the latter will in reality always be found below many younger than himself, compensating this by being also above about an equal number of older ones; now, our top boy has not let himself be surpassed by any single one of his juniors, and therefore would certainly have gone above a great many of his seniors, had the school included such. The top boy's true position may be roughly estimated by making him an extra allowance of a number of places equal to the general mean deviation of actual from average rank (which in this case comes to 5 places); clearly, also, such allowances may with equal right be claimed by the top boy, even if he does not happen to be the oldest; further, the same correction is applicable in slighter degree to the second boy, in still slighter to the third, and so on in a [p. 251] rapidly diminishing curve up to the centre of the school. For practical purposes, it has seemed sufficient to allow the next four boys, 4, 3, 2, and 1 places respectively; naturally, the whole of this correction must be repeated inversely for the bottom end of the school. Though this explanation is rather complicated, the correction is very easily carried out and, as stated, its effect is hardly appreciable.
The third kind of Intelligence is that represented and measurable by the general impression produced upon other people. This forms the basis of the common broad assortment of the children by their teachers into "bright," "average," "dull" respectively; and with such an assortment I have had to content myself for the elder children in the second series of experiments, while for those under 7 years of age, I have not obtained any intellective grading at all. But for the more important Series 1, a list of relative rank was procured of satisfactory completeness. It may here be noted that teachers, if directly asked for such a detailed list, frequently begin by asserting it to be impracticable. It will be generally found, however, that if they be merely requested to pick out the brighest [sic] pupil of all, they can do so without any great trouble; and when they are next requested to select the brightest of the remainder, they are still able to perform the desired feat; and so on, until the classification is complete.
The fourth and last sort of Intelligence which has here been estimated is that known as common sense. To this end, the oldest of the children of Series I was interviewed and interrogated concerning her comrades in precisely the manner described above, except that the criterion was not to be "brightness at school work" but "sharpness and common sense out of school;" and she seemed to have no great difficulty in forming her judgments concerning the others, having, indeed, known them all her life. As a check, and in order to eliminate undue partialities, it had been arranged that as she left the house, the second oldest child should enter it and thus be able to give an as far as possible independent list, since neither had beforehand had any idea of what was wanted. Finally, a similar list was obtained from the Rector's wife, who also had always lived in this village; but her graduation is unfortunately incomplete and therefore unusable, for she professed inability to pronounce verdict upon some few children who had not come much under her notice; as far as it went, it appeared perfectly homologous with the other two lists. [p. 252]
5. Procedure in Deducing Results.
(a) Method of Correlation. So far this chapter has been occupied with obtaining estimates as to the reagents' respective abilities in the several sensory and intellective functions. This is an operation requiring the fullest use of psychological insight; and, therefore, based on the long preliminary investigation previously described, every effort has been made to ferret out and evade all circumstances tending to make our little sample of facts appreciably misrepresentative of the real general relations or psychologically superficial and misleading. But the next portion of our problem is of a very definite objective nature; we wish to ascertain how far the observed ranks in the several abilities tend to correspond with one another; this, it is believed, is no longer a task to be effected by exertions of psychological ingenuity; instead of constructing complex arbitrary tables and plausible but more or less fanciful explanatory stories, we now are in need of such a procedure as will impartially utilize all our information in the demonstrably most complete manner and will focus it to a plain quantitative value; for the moment, psychology has to give way to mathematics.
Accordingly, all the more important correlations in the present work have been worked out by the best method hitherto evolved, that of "product moments," as Pearson terms it; only instead of using the actual measurements obtained for the reagents' respective thresholds, the change has been made of employing the numbers denoting their relative ranks; a full explanation of the advantages of this modification may be found in the article specially devoted to the topic (the chief being a reduction of the probable error equivalent to doubling the quantity of cases observed). Merely subsidiary results have often been reckoned by the much more convenient method of "rank differences," while a few correlations were for various reasons not amenable to either of these more exact methods, and therefore had to be worked out by Pearson's auxiliary method or mine of "class averages" (the latter has generally been preferred, on account of its smaller probable error). All these are but different ways of more or less closely arriving at the same measure of correlation, and thus all the results can be freely compared with one another.
The method of "product moments," though sometimes involving lengthy calculations, is so simple in principle that it can be worked by any moderately intelligent schoolboy. Explanation and illustration are given in the above article; here, nothing more than the general formula can be stated, which is as follows:
[p. 253] where x = any individual deviation from the general median as regards one of the compared characteristics,
y = the deviation of the same individual as regards the other characteristic,
Sxy = the sum of such products for all the individuals,
Sx2 = the sum of the squares of all the various values of x,
Sy2 = the same for y,
and r = the required correlation.
(b) Elimination of Observational Errors. This necessitates a further mathematical operation, which, however, is very brief and does not involve anything more than elementary arithmetic. There are two formulae, one theoretical and the other empirical:
where rp'q' = the mean correlation between the various gradings for p and those for q,
rp'1p'2 = the average correlation between one and another of these several independently obtained series of values for p,
rq'1q'2 = the same as regards q,
rp''q'' = the correlation of an amalgamated series of measurements for p with an amalgamated series for q,
m and n = the number of independent gradings for p and q respectively,
and rpq = the required real correlation between the true objective values of p and q.
It will be found exceedingly important to employ both formulae simultaneously, for they are independent of one another and each has different sources of fallacy, so that the most essential information is gained by a comparison between their respective results.
When we say that a series of objects correlates entirely with a second series, we do not assert that every set of measurements of the one will absolutely coincide with those of the other, seeing that discrepancies must inevitably arise from errors in measuring; we only mean that whatever all sets of measurements of the one series have in common with each other will also be found common to all measurements of the other series; then, either of the above formulae will exactly eliminate the observa- [p. 254] tional discrepancies and thus present the correlation in its entirety.
But much more often the measurements for the same series are connected with one another by more than connects them with the measurements of the other series, and then the case is ambiguous. Either the surplus really lies in the series measured, which is equivalent to saying that this series contains elements not common to the other series and that the correlation is to this extent incomplete; here, once more, both formulae will produce the properly corrected amount. Or, as is usual, the excess of agreement between the measurements for the same series may partly (or wholly) derive from their having the same constant fallacies; and now it will be found that both formulae give a correction still in the right direction but too small in quantity; further, this deficiency will be much greater for the theoretical formula than for the empirical one, so that when both formulae give the same result, we can assume that the latter has not been appreciably falsified by any constant fallacy common to the several sets of measurements for the same series.
Under special circumstances, the contrary case may occur of the sets of measurements for the one series being connected with each other by less than connects them with those of the other series. This will happen whenever several sets of measurements supposed to be taken from the same lot of objects are really procured from different ones and their several correspondences with the second series have arisen from independent causes. In physical matters, this danger is not serious; if two persons decide independently to measure a fossil cave-bear, they are unlikely to make the mistake of going to different animals. But in psychology it is otherwise; persons may honestly endeavor to appraise the same mental faculty, and yet, owing to diversity of procedure and ignorance of organic uniformities, they may really obtain measurements of quite independent function. In such case, the sets of measurement, however accurate they may be, will show no correspondence with one another; and if the functions are even only partially different, the measurements will correspond with one another to that extent less than they would be by reason solely of errors of observation.
The effect will be to falsify any corrections by the theoretical formula, for the latter begins by assuming only one lot of objects to have been measured and therefore the correspondence between the sets of measurements to be at least as great as might be expected from their accuracy -- an assumption generally fair enough, but under the special conditions delusively reducing the denominator and thus producing a final value proportionally too large. Now, this same fallacy affects cor- [p. 255] rections by the empirical formula in exactly the opposite direction; for the latter bases itself upon the fact that an amalgamation of several sets of measurements constantly emphasizes whatever elements are common to them all and simultaneously oblitarates all that are not common; thus in the normal case of only one lot of objects underlying the sets and determining their correlations to the other series, amalgamation will continually raise the correlation towards its full amount; but if there be more than one underlying lot of objects, each correlating with the other series independently, then amalgamation will not emphasize but obliterate these independent influences and consequently not raise but lower the correlation. Hence, when several functions really corresponding with the second series independently have been confounded together and taken for different measurements of a single correspondence, the results, as corrected by the respective formulae will sharply diverge. Conversely, if, when a double set of measurements has been made the empirical corrective formula produces an increase of correlation, then these sets of measurements may be regarded as certainly deriving from some single common faculty (and influences specific to each set of measurements being theoretically subtracted from the faculty and viewed as merely so many sources of observational error); and if the two corrective formulae lead to the same final amount of correlation, then this latter concerns wholly and solely the common faculty.
Further, it is of great importance to remark that the last fallacy, namely the case when measurements believed to be taken from the same function really derive from different ones correlating with the other series independently, may, by the first corrective formula, easily come to any values greater than 1 (and therefore impossible, seeing that 1 represents entirety). By the empirical formula, on the other hand, this can never occur; for whether the sets of measurements be connected with one another by either anything more or anything less than connects them with the measurements of any other compared series, then the correspondence between the two series will in both cases be reduced and therefore must necessarily be less than 1; in other words, the empirically corrected correlation can only amount to full unity when all the sets of measurements for both series have one common element and differ in every other systematic constituent.
Fuller explanation and illustration are given in the article devoted to the topic of measuring correlation.
(c) Elimination of Irrelevant Factors. This is the final opera- [p. 256] tion necessary to obtain a true result. Unlike the preceding one, it may often be altogether escaped; for if the conditions are favorable and if the preliminary investigation has been sufficiently thorough, the experiment need not be affected by any irrelevant factor of large enough magnitude sensibly to vitiate the result. Here, also, the necessary mathematical work has been reduced to brief and elementary arithmetic; for more explanation, the reader must again be referred to the special article.
If the irrelevant factor be connected with only one of the two compared series, the equation is:
where r'pq = the apparent correlation of p and q, the two variants to be compared,
rpv = the correlation of one of the above variants with a third and irrelevantly admitted variant v,
and rpq = the required real correlation between p and q, after compensating for the illegitimate influence of v.
If the irrelevant factor be connected with both series compared, the equation becomes:
where all the terms have the same meaning as before.
 Even the term "individual" does not seem very happy, since it chiefly awakens the impression of dealing with individuals as contrasted with masses. In this latter and much more appropriate sense, Wundt uses "Individual Psychology" in opposition to his "Folk Psychology" (Grundriss der Psychologie, p. 28).
 "La méthode à laquelle j'ai eu recours pour étudier ce phénomène (de la Reconnaissance) . . . pourrait servir a déterminer assez promptement et avec une rigeur satisfaisante quelle est l'aptitude d'une personne à reconnaître." And similarly for the other two faculties.
 Some experimenters (Lehmann, Angell), it is true, have found that "correctness of sensory judgment is practically independent of time interval" up to at least a minute, and even of deliberate distraction (Angell, Wundt's Phil. Stud., Vol. XVII, p. 11). I have, however, been unable to convince myself that this conclusion holds at all good for the ordinary conditions of experiment.
 In the present work, it is taken that a threshold obtained in v. d. ( = vibrations difference) for any tone within the two octaves above the middle C may be assumed also to hold approximately good for the remainder of these two octaves and both above and below gradually to augment. So much appears sufficiently demonstrated by the works of Preyer (Die Grenze der Tonwahrnehmung), Luft (Wundt's Phil. Stud., Vol. IV, pp. 511 ff.), and Meyer (Zeit. f. Psych. u. Phys., Vol. XVI, p. 352). Hence the results of all experiments conducted anywhere near the centre of the ordinary musical scale admit of being easily collated with one another. For procedure in comparing any values obtained by the method of "Minimal Changes" with those by "Right and Wrong Cases," see Lorenz (Wundt's Phil. Stud., Vol. II), Merkel (Ibid., Vols. IV and VII), Kämpfe (Ibid., Vol. VIII), and Mosch (Ibid., Vol. XX).
 The improbability of mere chance works out mathematically to about 100 to 1 for this series of observations alone, not to mention the corroborative evidence of the children and other experimenters.
 The Table appears to bring the various results of the best workers into very good harmony with one another and also with my own. It is regretted, however, that there is one exception to this general reconciliation; for it has been found impossible satisfactorily to accord the above estimates with a noteworthy series of experiments at Iowa in 1896, under the general direction of Seashore. There, the male University students showed a median discrimination of about 10 v.d., and the women one of about 9 v.d. These values, taking all things into consideration, appear at least twice as low as the analogous ones of Delezenne, Preyer, Gilbert, and myself. Moreover, it is even harder to reconcile Seashore's results from adults with his own simultaneous ones from children 10-14 years old; for the latter perfectly coincide with our table, showing a discrimination of about 4 v.d., and would therefore have double the sensitivity of the adults!
It must also be mentioned that Seashore arrives at a very different opinion from the above as regards the general effect of Practice. He holds that neither musical education nor special training can materially affect most people's power of discriminating, and he supports his view by inquiries instituted among the parents of his children and also upon a few experiments.
 It is here by no means intended to assent to the identification of the feeling of Effort with any combination of touch sensations, but only to admit that the former is in practice chiefly estimated through the mediation of the latter.