Econlib

The Library

Other Sites

Front Page arrow Titles (by Subject) arrow CHAPTER IV: DATA COLLECTION - The Selected Works of Gordon Tullock, vol. 3 The Organization of Inquiry

Return to Title Page for The Selected Works of Gordon Tullock, vol. 3 The Organization of Inquiry

Search this Title:

CHAPTER IV: DATA COLLECTION - Gordon Tullock, The Selected Works of Gordon Tullock, vol. 3 The Organization of Inquiry [1966]

Edition used:

The Selected Works of Gordon Tullock, vol. 3 The Organization of Inquiry, ed. and with an Introduction by Charles K. Rowley (Indianapolis: Liberty Fund, 2005).

About Liberty Fund:

Liberty Fund, Inc. is a private, educational foundation established to encourage the study of the ideal of a society of free and responsible individuals.


CHAPTER IV

DATA COLLECTION

That data collection is a major scientific activity and that it leads to formulation of hypotheses will hardly be denied. It is frequently pointed out, however, that most data have been collected as the result of pre-existing hypotheses. This is true, but it does not affect our reasons for treating data collecting separately. With regard to any specific hypothesis, a good deal of data was present in the mind of the inventor when he made it. The sources of the data may not be directly relevant to the new hypothesis. The hypothesis which led to the accumulation of the data on which the new hypothesis is based may be either trivial or irrelevant. For an example of the trivial, we may take the multiplication table, a basic element in much scientific reasoning. It can be said that the scientist has this in his mind because of two hypotheses: a hypothesis on the part of those responsible for his education that it would be useful for him to know how to multiply, and a hypothesis on the part of the young scientist that he would be disciplined if he did not. For an example of the not directly relevant, we may consider information which a scientist obtained as the result of a previous hypothesis unrelated to the new one. For example, he may have switched from one branch of his field to another, but may have found some fact that he discovered in his first field to be of great importance in his new one.

The data which are important to a scientist when he forms a hypothesis are the data which are present in the scientist’s mind. We have vast libraries of accumulated facts, but until they get into someone’s mind, no hypothesis will be developed therefrom. The collections of facts in the library “only stand and wait.”1 We collect these vast masses of data partly because we think them interesting in themselves, but principally in hopes that someone will use them to develop a general law of some sort. Before this happens the someone must learn of them, i.e., he must get them into his mind. Since the capacity of the human mind is smaller than that of even a rather small library,2 this may seem a hopeless task, but, as we shall see, social co-operation provides a partial solution to the problem.

We shall therefore discuss how information accumulates in the minds of individual human beings. The libraries and indexes will be considered only as aids to this accumulation. We shall also, however, consider the pattern formed by the information-collecting activities of a number of people and try to answer the question of what an optimum organization would be. Throughout we shall think of data collection primarily as a preliminary step to the development of hypotheses, but the data collector may be collecting simply to put his findings in a library somewhere for another to use in framing a hypothesis.

A good deal of the information contained in any human mind is simply the result of accident. Anyone will accumulate lots of facts which are of no real interest to him, but which his memory will retain for some period of time. I know, for example, the general arrangement of furniture in the apartment across the hall from my own although I do not know my neighbors more than by sight. They have a habit of leaving their door open, and I therefore sometimes see into their apartment when leaving my own. This is an extreme example, but that we all have a good deal of information which has come to us through pure accident is obvious. To a scientist, this may be more important than to the ordinary man, since he is apt to accumulate information about his field this way. He sees things in the laboratory, finds his work interrupted by colleagues who insist on boring him by discussing their work, and hears a great deal of gossip. It is quite possible that some bit of information obtained by these accidental means may be of great importance to him.

Far more important, however, is the information picked up through the formal educational process. Preparing people to advance human knowledge is not, of course, the only objective of the educational system. Potential researchers are only a small minority among those receiving educations. It is probably not even among the three or four most important objectives, but it is a purpose of education, and a good deal of the data possessed by the average scientist comes from his education. Strictly speaking, there are two types of education: self-education and formal education. They tend to go on together, but for reasons of simplicity we shall discuss them seriatim. If preparing for scientific work is a rather minor part of the educational system, it is not unimportant from the standpoint of the man who eventually does end up as a scientist. We can therefore consider the educational system solely as it affects the potential scientist in his preparation for his work and ignore all the other aspects of the subject. Our discussion may give a rather distorted picture of education as a whole, but will be a useful abstraction from our present standpoint.

We can represent the results of our present method of education on the knowledge of a scientist by the following diagram.

lf1279-03_figure_005

At the bottom we have a smattering of information from many fields, which is called general education. It seems to be the opinion of some educators that this covers all of human knowledge, but this is obviously absurd. Many things taught in elementary school in other parts of the world are learned in the United States only by a very few specialists near the end of their education, if at all. This is not, of course, a criticism. If “general education” really tried to cover everything, its coverage of the subjects now included would have to be sharply reduced. Whether the particular combination of subjects now covered in our educational system under this head is ideal, I have no way of telling, but that selection begins at this level is clear.

Students usually also undertake special studies in some given field, say history or physics. Normally, but not always, this field is one to which the student has already been introduced by his general education (as on our diagram). In this field, he becomes much better trained than in the other areas where his education is only general. Normally, also, the student will be expected to specialize in a small section of his field, let us say the feudal period in England or crystallography. In this special field, his education will be pushed even further. Eventually, he will usually write a doctoral dissertation based on research into some problem, and, theoretically, this itself is a contribution to knowledge.3 These projects tend to be fairly trivial, but they do improve the student’s knowledge of one field.

Debate on educational policy is largely confined to discussion of the relative weight to give to the different rectangles on our diagram. Greater specialization, or a better general education, is stressed by various writers, and a better “broad” background in an entire field is frequently advocated. To this dispute I have nothing to contribute. My only point is that increases in one area must be offset by decreases in others. The total areas covered by our rectangles cannot exceed the learning capacity of the student. Any argument for more general education is, at the same time, an argument for less knowledge of the student’s particular special field of concentration. Obviously, there will be advantages in both “generalism” and “specialization,” but we cannot have both and must make some sort of compromise. To repeat, I have nothing much to say about what compromise is desirable in any one case, but I think that I can say that the compromise should differ radically from student to student.

Let us consider the problem of making scientific advances in the abstract. The new ideas which this advance requires come largely from the brains of men who know some of the facts. These new ideas then stimulate the further research which proves or disproves the ideas and which produces further facts upon which further ideas develop. Consider a situation in which the whole of human knowledge is eight facts: A, B, C, D, E, F, G, and H. A scientist appears with a theory based on A, B, D, and H and suggests that further research be undertaken to discover whether S, hypothesized by the theory, really exists. At this stage, we need not consider this further research, but can simply discuss the formation of the original theory. First, note that although C lies between B and D, it is not included in the theory, while H, at the other end of the spectrum, is. This clearly is no objection to the theory. There may be another theory which includes C and excludes H, but until it is propounded, we will never know.

This is an illustration of the elementary fact that until a new theory is developed, we can never know what field it will cover. If education in the society we are considering had been divided between two specialties, A–D and E–H, then the new theory would never have been proposed, since no one would have simultaneously had facts A, B, D, and H in his mind. In this very simple society, the whole of human knowledge could readily be held in the mind of one man, so this problem would not be likely to be serious, but in real life the total available knowledge is vastly beyond the capacity of one mind. If we cannot tell in advance which combinations of information are the necessary basis for new theories, how can we organize our educational system so as to maximize the production of new theories?

The obvious answer appears to be a system of random assortment. We might aim at having all possible combinations present in the brains of different people. The problem here is mathematical. Assume that the total information available to the human race, stated in its most compact form,4 would be enough exactly to fill the minds of four people. If we wish to assure that any two bits of information are present in the mind of at least one man, we can follow the process of dividing the total information into eight equal parts and then directing the education of various people so that all possible combinations are present in at least one head. This would require a minimum of twenty-eight people.

Suppose we want to insure that any three bits of information are present in the mind of at least one person. Here again, we can divide the information into twelve equal parts and then direct the education of students so that each of them masters three of these, and all possible combinations are represented. Unfortunately, we find that it would require 220 people to cover all possible combinations. Since the total information available to the human race is vastly greater than could be mastered by four people, probably more than could be mastered by forty thousand, and since our theories are normally suggested by complexes of facts considerably in excess of three, it is obvious that the total number of people necessary to insure that any combination of presently known facts which might lead to a new theory is known to at least one man would be one of those vast numbers which exceed anything in nature and occur only in probability mathematics.

Even if we did have this incredible supply of human beings and could direct their education so that all possible combinations of information were present, there still would be no assurance that every possible theory would occur to at least one person. The human mind is a chancy thing, and we could hardly take much assurance from the fact that one person was educationally qualified to perceive a given possible interrelation of facts. We would need several thousand such people to feel even modestly secure in the belief that all possible interrelations would be perceived. It seems clear, then, that we cannot hope to distribute the sum of human knowledge in such a way that all possible interrelations are likely to be perceived, and that even a near approximation is out of the question. Our objective must be restricted to making the best of our very limited resources.

The method now in use by the educational system is essentially based on an implicit prediction of the areas in which new discoveries are most likely. Such predictions, like any effort to anticipate the direction and pace of the growth of our knowledge, are extremely difficult. If we could predict with certainty in this field, this would amount to knowing today the things which we predict for tomorrow and, hence, would not involve prediction. In addition to this logical difficulty, the record for such predictions is very bad. Scientists frequently offer predictions of the future based on their view of the probable development of technology, and these predictions are as poor as other predictions of the future. Even in narrow fields major mistakes are made. When penicillin was discovered, the chemists thought they could synthesize it and get it into mass production in about eighteen months. Fortunately, another program to produce it by biological means was put in hand to cover the eighteen-month period. It turned out that the synthesis was vastly harder than anticipated. Even today almost all penicillin in general use is biologically rather than synthetically produced. Research is, by definition, search for the unknown, and we can hardly know in advance what the unknown will turn out to be.

Guesses can be made, however, and our present scientific educational system is implicitly based on such guesses. Let us consider again the system of education outlined above. Only, this time, let us put several different people on our chart.

Each of the nine people, A–I, has received a general education which is identical. A, B, and C have also received identical educations in the first general field; D, E, and F in the second field; and G, H, and I in the third. Each person also has his own specialty.

I have also drawn in three sets of “facts” which might, to the prepared mind, suggest a hypothesis, the set of the X’s, the set of the O’s, and the set of the dots. Note that scientist A would have all of the information necessary to discover hypothesis X. Part of this information is directly in his specialty, part comes from his general knowledge of his field, and part comes from his rather low-level knowledge of F’s field which he got as part of his general education. This type of theory, then, would be likely to be discovered with the educational distribution shown. With the theories which the O’s or the dots would suggest, however, this is not so. No single member of the group of scientists has control of all of the facts which would suggest these theories. Education organized in this way, then, would appear to be justified only if it is believed that more theories are of the X kind than of the O or dot kind.

lf1279-03_figure_006

Presumably, most people engaged in administrative planning of scientific work have given the problem little conscious thought. Nevertheless, they do probably reach fairly good results by use of another line of reasoning. The various subjects learned by a student today are grouped partly by categories which are simply historical developments and partly by categories of things which appear to be related. I can suggest no better organization for education, but we should realize that it is far from ideal. Further, even if more theories can be discovered by this organization than by any other, this does not imply that we should confine ourselves to one system.

Unfortunately, we cannot plan to set up educational systems which will bring together in one mind all the facts which will lead to some given hypothesis, because, until the hypothesis is discovered, we do not know what these facts are. A second system of organization of studies crossing the basic one might, however, lead to improved possibilities of discovery. Thus, the person J, following the course of study enclosed by the curved line, might discover hypothesis O. It must be noted, however, that the crossing system of education will necessarily reduce the manpower available for the basic one.5 Further, for the mathematical reasons given above, we cannot hope to set up an elaborate system which insures complete coverage of all possible hypotheses. Still, classifying knowledge according to two distinct systems for the purpose of educating researchers is very likely to be worthwhile.

This is what our present educational system does. In addition to the budding scientist studying such fields as physics and chemistry and their subdivisions, there are the engineers studying such fields as civil and mechanical engineering and their subsections. These fields cut across the scientific ones. A student preparing to be an automotive engineer, for example, must know something of both organic and inorganic chemistry, metallurgy, gas and liquid dynamics, thermodynamics, electricity, mechanics in the strict sense, and even some human anatomy and psychology. Although he is unlikely to know as much about any one of those fields as a scientist in that field, none of the scientists will know as much as he does about all of the fields and about their interrelations in the design of an automobile.

The raison d’être of the two systems can be readily perceived. The scientific disciplines are defined in terms of the traditional fields of knowledge, which are felt to conform somehow to the underlying order of nature. We might say that theoretical unification of each of the scientific fields is thought to be likely, although none has yet achieved this status.6 In any event, we feel that the division between physics and biology, say, does reflect a basic difference in the nature of the phenomena studied. The scientific fields, then, are unified by hypotheses about the nature of the structure of the universe.

The engineering fields, on the other hand, are defined in terms of utility. Each covers a given type of activity to which knowledge can be applied. The civil engineer learns to build bridges and is not much concerned with the question of whether his rules for required strengths of materials and for avoiding resonance may someday prove to be deducible from some single theory. He may select ideas from the most diverse fields of knowledge, if they permit him to make something useful. Engineering fields are thus defined by the type of knowledge which has been used in some type of practical activity.

The difference between the two approaches may serve to point out the difference between the hypotheses of “pure” science and those of applied science. The pure scientist searches for some way of integrating knowledge into a larger whole which will explain some given area. The applied scientist searches for some way of integrating knowledge which will permit the construction of some “device.” The underlying unit of the universe is sought in one case, in the other, a completely different type of unity. In the case of the pure scientist the unity sought is a pre-existing natural unity. In the case of the applied scientist it is a deliberately contrived unity in which diverse things are brought together to serve some end. One type of unity may be represented by the Newtonian mechanics and the other by a diesel engine.

Our educational system, by producing researchers from two different sets of schools, the schools of science and the schools of engineering, thus makes sure that two different systems of classification will be used in deciding what knowledge is held in the brains of the different investigators. This may result in more hypotheses being susceptible to discovery than would a system using one or three classifications. On our graph, however, one theory, the one of the dots, remains undiscoverable because no individual has the necessary knowledge in his mind. This may, of course, be much the most important hypothesis of the three. We can do nothing about this, because, in order to put the necessary knowledge in the mind of an investigator, we would first have to know what knowledge was needed, and this can be discovered only after we have the hypothesis. All of our present classification systems operate as implicit predictions of the knowledge which will, in the future, inspire hypotheses, but they are based on history. Extrapolations into the future are notoriously dangerous, but it is hard to see how major improvements can be made.

One change might be attempted in the organization of our scientific education.7 The concentration of students in the regularly defined fields which have led to discoveries in the past is clearly rational, but the present organzation of our teaching system probably raises this concentration above the desirable level. Analogically, we may say that our present scheme follows line A instead of the proper line B (as shown in the following diagram). While it is sensible to have the bulk of our students working on the particular combinations of information which make up the traditional fields of science and their subdivisions, it would be desirable to have some pursuing differently organized knowledge. To take an extreme case, we probably have no man in the world who has devoted half of his time to nuclear physics and half to marine biology. I doubt if it would be wise to develop any sizable education system to produce such men, but I think one such man might be worthwhile.

lf1279-03_figure_007

The problem, of course, is the faculty organization of most universities. A student is normally required to choose a department and frequently a subdivision within that; and a good deal of pressure, sometimes quite unconscious pressure, is put on him to take a standard set of courses. Similarly, universities in hiring staff look for people who are qualified for certain departments. There are few appointments for men who do not really fit into any one of the departments. Since this system seems to work well for the bulk of scientific research, the problem is to provide for another system for a minority of our scientists. This could probably be done if a fraction of the universities, say 10 per cent, emphasized interdepartmental work on both the student and faculty levels. The danger would be that our practice of following educational fads and fashions, combined with the strong tendency to conform, would lead to either a too large or a too small number of “interdisciplinary” scientists.

Turning now to the self-education of the scientists, we should once again note that the distinction between formal education and self-education is a hazy one. Further, the distinction between self-education and regular research is even vaguer. Nevertheless, we can usefully devote some attention to this rather vaguely defined field. Its importance is obvious to anyone who looks at all carefully into the biographies of major scientists.8 In a surprising number of cases their most important work was not even in the same specialized field as their formal education, and, in the overwhelming majority of cases, their discoveries arose as a result of information obtained after they left their universities. The tendency to do work outside the field of training appears to be particularly strong in the applied fields and with those pure scientists who are actually motivated by curiosity. The scientists induced to be curious are more likely to stick to their original speciality.

The reasons for the importance of self-education are, of course, quite obvious. In the first place, knowledge tends to get out of date. A distinguished chemist has just retired from my university. Most of his recent research has involved radioactive tagging of chemicals. When he published his first article, in 1911, this technique was not even dreamed of. Even if the student leaves school with the very latest information, shortly he will find his school-taught education sadly deficient. It is probable that the simple process of keeping up with developments by itself will result in the average scientist ten years out of school having more self-taught information than school-taught. The scientist who fails to keep up will probably make no significant discoveries; the ones who do make discoveries are likely to have a large part of their knowledge as the result of self-education.

This is by no means the only reason for the importance of self-education in the development of scientists. It is the self-education of a scientist which differentiates him from the other scientists. His value comes largely from the fact that the particular combination of information which he has mastered is different from that held by any other scientist. Obviously, this kind of information could not come from a formal education. In fact, the process of self-education followed by most scientists is at the same time more specialized and more general than any formal educational system. The pattern of reading he will follow will be unique in the sense that no other individual is doing exactly the same. On the other hand, it will not be as bound by the formal division of science into fields and specialties as is the educational process.9 Since the self-education of the scientist is more carefully fitted to his personal interests than the formal education he has received, it is likely to play a larger role in his work than does his formal education.

The difference between the significance of self-obtained information and formally taught material to most scientists, particularly the greatest, is so large that we may even question whether the importance of a scientific education lies in the subjects actually taught or in the habits and contacts formed in the schools. The student leaves his university with a good deal of factual and theoretical information, but it may be that other things are really more important. He has a formal entreé, a sort of union card, which permits him to get a scientific job. Not least important, he has a web of contacts with other people who are obligated by the current scholastic ethic to assist him in getting a scientific job but not any other kind of job. He is convinced, partly as a result of his original choice of profession, partly as a result of his great commitment of time which he does not want to waste, and partly because of his associations during his education, that he is a scientist and that the rest of his life will be devoted to investigation. He has probably also been convinced by the process of indoctrination carried on in most scientific educational institutions that science is a high and noble profession (I do not quarrel with this appraisal; I only say there are other high and noble professions) and that its practitioners are somehow superior to other men.10 Once he has become a practicing scientist, he will educate himself to a very great degree.

The self-education of a scientist turns largely on three sets of institutions: the learned periodicals, scientific publications which are not periodical, and conventions.11 We shall take them up in turn, starting with the learned periodicals, but first we must briefly discuss two less important channels of information, the non-scientific press and gossip, which are important as fast, albeit inaccurate, channels of information. Important scientific developments not infrequently get on the front page of the New York Times, and this insures them wide circulation. Recently the regular scientific journals, particularly in physics, have become annoyed at being “scooped” and have been putting considerable pressure on scientists to give them the “first publication.” In some cases this pressure has gone to the extreme of threatening to deny publication to any work which has previously appeared in the press. While the jealousy of the editors of the scientific journals is perfectly understandable, it is unlikely that they will be able to censor the press successfully.

Gossip is usually faster than the popular press in transmitting new discoveries around the scientific community, and even more inaccurate. It should be emphasized, however, that gossip more often takes the form of a letter than a face-to-face conversation. Scientists are highly dispersed, and their communication is likely to be, even today, largely through the written rather than the spoken word. They do write each other, however, and they frequently pass on rumors about various people’s work. Usually the reports are fragmentary and highly casual, but a scientist often hears of important new developments in his field first through such channels. An effort to formalize this channel of communication in the field of Sinological studies was made by George Kennedy in the form of a sort of intermittent newsletter called Wen Ti. A more recent example is the “information-exchange group set up to provide better communication among scientists in the related fields of electron transfer, oxidative and photosynthetic phosphorylation, ion transport, and membrane structure and function.”12

Turning to the scientific periodicals, their importance to science is so great that it is possible to argue that modern science really began when the first such periodical was published. They generally serve two distinct functions: to disseminate news of new discoveries through the community and to serve as a file to which researchers who wish to find what is already known about a given subject can turn. The two functions would normally lead to somewhat different editorial policies, but since both will be served reasonably well by selecting the most important articles out of those submitted, it is possible to combine them. The research function of the magazines will be discussed later; we will now confine ourselves to their function as news-magazines.

The system on which they operate is simple. A scientist who has made what he considers an important discovery writes it up and mails it to a journal. If the editor agrees with him, it is accepted, which normally means that it will be printed, but its publication does not result in any direct payment to the scientist. It will, of course, increase his prestige, and this may indirectly increase his income.13 If the editor does not like the article, he rejects it, and the scientist is free to submit it to some other journal. Eventually it is either printed or the scientist gives up. The fact that each journal considers the article separately is of the utmost importance.14 It is less likely that a new and different idea will be rejected by each of seven men acting independently than that it would be rejected by a board of the same seven men or even by the most brilliant among them. Some other institutional arrangement might well lead to the average quality of articles being better, but from the standpoint of giving new ideas a hearing regardless of how radical they are, the present system is hard to improve on.15

Normally the scientist himself will decide to which magazine to submit his article on the basis of two criteria, the special field covered and the prestige of the magazine. He starts with the magazine he thinks most suitable and, if his article is rejected, works his way down. We have already mentioned the prestige aspect, and no further discussion is necessary, but the field of specialization raises certain difficulties. In the first place, the definition of the field itself raises the problems of classification of knowledge which we have discussed. An article which does not fit the field covered by any given magazine may be very hard to publish. Returning to our diagram, if the fields of knowledge are shared by magazines, as shown on the following figure, potential research projects would be outside the scope of all of them. (Magazines actually tend to occupy overlapping fields, but this would only marginally affect the reasoning on which the diagram is based.)

lf1279-03_figure_008

Magazine A would welcome the research represented by the dots, but any of the magazines A–I would tend to feel the work represented by the X’s, the O’s, or the triangles was mostly outside its field. It would be very hard, therefore, to get articles covering such work published. The problem is partially solved, again, by the engineers. The classifications used by the engineering publications are completely different; articles which would not fit into any given “scientific” field may fit neatly into an engineering magazine’s editorial scheme. The June, 1965, issue of Astronautics and Aeronautics, which I happen to have on my desk, carries mainly articles which cannot even be classified by scientific fields. They involve the discussion of devices which incorporate elements from almost all the fields of physics. Thus, returning to our diagram, the engineering periodical Z might publish the work based on the X’s, although none of the strictly scientific periodicals would.

We can also classify journals according to level of specialization. In the last figure, journal bcd covers three different fields from the standpoint of journals B, C, and D. For a man doing work which does not fall into any one of these three small fields, this would be highly helpful. Project O, for example, falls neatly into the scope of bcd although outside the scope of any of the more specialized publications. Thus, more generalized publications may also help to provide an outlet for work which does not fit into a narrowly defined specialty. This is of limited use, however, because the magazines of wider scope frequently think of themselves largely as “popularizers” of the material published in the narrower journals in their fields. To take an extreme example, the Scientific American does not publish articles which are not contained in one of the scientific specialties. Its editors consider themselves to be engaged in informing scientists of what is going on in other fields, not in publishing original research which does not fit into any given field. Even the journals which do publish original work spreading across several fields normally print work that is less “advanced” than the work in the individual specialties.

Returning to the figure, we can consider that the research is arranged from bottom to top in order of its historic discovery. Thus, the newest advances would be at the very top. Theory O, then, would be the kind of original research actually published by most non-specialized journals. It combines a very advanced discovery in one subfield with information long known in others. A theory based on the latest advances in each of the subfields would probably not be published. This is possibly not a serious drawback, since as science is now organized, few such projects are carried out. Still, once again we find science organized to confine research in a predetermined mold of existing special fields. I deplore this existing situation, but I can offer no proposals for improvements.

If the journals are to have the effect of spreading news through the profession, they must be read. Since the scientist will want to satisfy his curiosity or to find something which can be made use of, he has an incentive to read them. He cannot, however, hope to read all of them. Not only does he lack the time; the more specialized ones are written in a style which is intelligible only to specialists. He must therefore choose some among them to read, some to skim, and a much larger number to ignore. For this function, he need not take account of the prestige of the various publications, except in the very early stages of his training. He should read the ones which interest him most, while skimming those which occasionally carry an article which appeals to him. The choice of scientists who operate on this principle establishes the relative prestige of the various magazines.

This system of deciding what to read is not only the easiest from the standpoint of the individual scientist, it is the system most likely to promote the advancement of science. If, however, the various individual scientists confine themselves to narrow specialties in their reading (which would simply reflect their interests), then the discoveries which require knowledge of several specialties will not be made. Radio astronomy, for example, is largely a post–World War II development, although the technical foundation for it had been available at least since the early 1920’s. The long delay in its development obviously arose from the fact that no astronomer knew or cared much about electronics, and the electronics specialists were equally uninterested in astronomy. The eventual development of radio astronomy was largely initiated by a radio engineer of no outstanding talent who simply became interested in the applications of his subject to extraterrestrial radiation. This tremendous step forward in astronomy was made by a man whose education and native intelligence were doubtless far inferior to those of numerous astronomers and physicists whose contribution to the advancement of knowledge was much less than his. His sole advantage was an unusual combination of information and interests. The “marginal return” on this combination was much higher than on the more normal combinations.16 As a result the Smithsonian now proudly displays the world’s first radio telescope: a machine built in his backyard by a middle-class engineer.

Scientists must choose what they will read as part of their general self-education, keeping in mind their limited capacity to absorb more than some given quantity of information. Some should select that given quantity from among the traditional specialties (some of the traditions may be of very recent origin, perhaps only a few years); others possibly should decide to combine two fields, knowing that this will lead to their being less well-informed in both than a specialist, but hoping to get something out of the interrelationship of the fields. Others may concentrate on one field but have a “minor” in another, or perhaps several others. For mathematical reasons, it is not possible to have all of the possible combinations and permutations present, but there are far more individual scientists than there are scientific organizations or journals, and this gives more possibilities for coverage of unconventional combinations of knowledge to individuals than to the more organized groupings.

Scientific periodicals operate on varying levels of generality. At one extreme is the Scientific American, obviously written solely for the layman about each branch of science. One can hardly read any article without realizing it is written for the benefit of people who do not know much about the subject matter. Everything is carefully explained, and there is none of that reliance on the specialized knowledge of the reader to fill the gaps which makes the more professional periodicals so unreadable to non-specialists. Each article is written, not for the benefit of the experts in that field, but for the benefit of people who are specialists in some other field, but ignorant in this one. The result is that any intelligent man with the average college “liberal arts” background in science can follow it easily.17

Between the Scientific American and the narrowest specialized periodical there is a whole gradation of magazines of varying levels of generality. The degree to which scientists read articles outside their specialized fields varies, of course, from person to person, but all of them read at least one. The fact that they follow this course means that the less specialized magazines have a larger readership and greater influence than the highly specialized ones. This leads to more careful editorial work, the possibility of commissioning special articles on a fee basis, and better make-up in the more general magazines. In this as in so many things, the Scientific American, with its fine printing, numerous advertisements, and specially written articles, presents the extreme case. This editorial superiority of the more general magazines probably leads to their being more influential in shaping the developments of science than might be imagined from their numbers. It is possible that this partially counterbalances the highly specialized nature of the rest of the scientific community.

The non-periodical literature plays a subordinate, but nevertheless important, role in the diffusion of knowledge through the scientific community. The most important type of non-periodical literature is, of course, books. Large numbers of books are written and published (sometimes commercially, sometimes on a subsidized basis) in the various scientific fields. Books naturally cover much broader fields than articles. The article typically reports some investigation which resulted in some specific discovery. The book will almost always cover the equivalent of a number of articles, but will also involve an effort to integrate them into a general scheme. The “big picture” is more clearly presented in the book field than in the periodical system.

In many cases the book makes no real effort to present new discoveries, but confines itself to reviewing what is already known.18 While I do not wish to discuss the question of the relative merits of writing articles and writing books, it is clear that books, by reviewing what is known in some field, perform an important function. By putting the data in a coherent order, they may considerably assist the individual investigators in ordering their thoughts. Even more important, a man from some other field who decides that he needs information on the field covered by a given book and by many articles will normally turn first to the book. Thus the book-writer is, in part, educating people outside his field and contributing to interfield co-ordination. The absence of a “standard work” in any given field poses a considerable barrier to the dispersion of knowledge from that field.

Intermediate in length between articles and books are the monographs. Prima facie one would assume that there would be many more monographs than books, but the reverse is true in the United States. Relatively few monographs are circulated, and the ones that do appear have a strong tendency to be ignored. This is, I think, a significant defect in the organization of science in the United States.19 There is no apparent natural law which provides that discoveries will always either be readily presentable in article form or justify a full-length book. It seems likely that more “available” discoveries would require 40–120 pages to report than would require a full-length book. The forcing of actual publication into the article-book mold, therefore, must both direct research toward “article” and “book” projects and away from “monograph” research, and result in what research of “monograph” size is done being reported in an inconvenient form.20

There does not appear to be much, however, that can be done about this. If scientists in America prefer to confine their reading to books and magazines and to pay little attention to items of intermediate length, then the “market” for monographs will remain limited and the incentive to produce them slight. The current situation, where most monographs are produced largely for substantially free circulation by various sponsoring organizations, will continue until scientists change their reading habits. Since the circulation of such monographs is essentially free of editorial control, it is not surprising that vast numbers of short items are now also distributed free by individual scientists and scientific organizations.

In reading books (and monographs), scientists can hardly fall into the habit of using the same source, as they do with magazines. They must make conscious choices, instead of simply taking the latest issue of their favorite magazine. They are likely, in fact, to be heavily influenced by the reviews in the magazines in making such choices. This fact would appear to dictate great caution to the magazines in reviewing, but this is normally handled in a rather slipshod manner.21 As a consequence, the readership of new books may be largely determined by the accident of who is selected by an overworked editor to do the reviewing. Fortunately, most scientists read several periodicals and are likely to see several reviews of any given book.22

In reading books, one of the major objectives which seems to guide most scientists is reviewing what they already know. There is no reason to object to this, of course, but the reading of books in fields which are new to the scientists is more interesting. A good, interesting book which attracts the readership of a number of men who would not otherwise have learned much about its subject is likely to have a most stimulating effect on the development of knowledge.23 Almost certainly some of the readers will find some knowledge in this book usefully combinable with their previous knowledge.

In addition to reading a great deal, the self-education of scientists normally involves meetings and conventions. It is my impression that these meetings are generally more social than scientific, but that they have a scientific component is undeniable. The exchange of gossip, the miscellaneous drinking and socializing which go with the conventions of the various learned societies have been widely commented on, but I see no reason to object to it. Most scientists live rather isolated lives intellectually. They have no one near them who is much interested in their specialties. When they finally get a chance to talk with people who are so interested, it is not surprising that they relax a bit. A good deal of the socializing “small talk” is scientific. Scientists genuinely interested in their subject are not likely to waste much of their limited opportunities to talk to others so interested. To the layman, it may not seem likely that a “disorderly” party of scientists in the next room are all talking about colloids, but if that is their speciality, quite probably that is just what they are doing.

In any event, a good deal of professional discussion does go on at these meetings, and, although it is the fashion to deplore the papers read at the formal meetings,24 they undoubtedly do perform a function in spreading newly acquired knowledge through the profession. The meetings are the only occasions on which a scientist, presenting his work to his peers, is immediately subject to oral critical discussion. A scientist who is writing an article may feel that potential critics will write letters to the editor only if they have fairly serious differences with him. In oral discussion of a paper, on the other hand, much finer objections may be raised. This undoubtedly is good discipline for the scientist.

We have so far discussed the problem of self-education, which we might consider as the pursuit of information to satisfy the general curiosity of a pure scientist. For an applied scientist, it performs somewhat the same function, except that he will always aim his self-education at getting practically useful ideas. We will now turn to the stage at which the scientist, whether pure or applied, begins to engage in research on a particular problem. This, however, raises a special problem of the sort which is a continual irritant to anyone trying to consider seriatim a general process which can, in reality, follow many different courses. Our process is from data accumulation to hypothesis to checking to dissemination of discovery. We have broken the data-collection stage into two parts: education and specific investigation. Unfortunately—and we will meet similar problems later—some hypotheses are formed without any specific investigation. It is not uncommon for an investigator suddenly to perceive a pattern among the data as a result of his general reading. Under these circumstances he proceeds directly from the self-education, general-curiosity stage to the hypothesis.

As an example, we may cite the legendary discovery of the Newtonian physics. If the story is to be believed, the basic idea occurred to Newton when he was hit by a falling apple. Widely learned in the physics and mathematics of his time, he suddenly realized that a large number of phenomena previously believed to be independent could be united by one theory. That such things happen and that they are sometimes of great importance to the advancement of knowledge is clear. If such instances of skipping stages seem a little irregular, they raise no particular difficulty for the general theory of this book. We follow a descriptive scheme from accumulation of facts to hypothesis to checking and disseminating the hypothesis. Normally the accumulation of facts divides into two stages, general and special accumulation, but if the hypothesis is formed on facts accumulated in the general stage, we simply skip the process of particular investigation. The other stages also may sometimes be skipped. Shortly, we will discuss a situation in which the hypothesis itself is skipped.

Turning now to the particular investigation, we must carefully distinguish between the type of investigation which is begun to check a hypothesis and that which is undertaken in hopes that it will lead to a hypothesis. Probably most scientists are little concerned with this problem, which, in fact, makes no practical difference to them, but it is necessary to talk about one thing at a time if clarity is to be achieved. We will take up investigation which it is hoped will lead to a hypothesis here and leave investigation undertaken to check the hypothesis till later. In a sense, of course, all investigation proceeds from a hypothesis. In the case which we are about to consider, the hypothesis is of this form: Investigation of problem A will develop factual information which will permit formulation of a general hypothesis. This is an investigative hypothesis, a guess about the advantage to science of pursuing a particular line of research, however, and should be strictly distinguished from a hypothesis which purports to be an advance in itself.

Supposing, then, that an investigator has become interested in a given problem or area of knowledge (or combination of problems or areas of knowledge) and proposes to increase his information in that area. Normally, his first step will be to examine the literature in order to find out what has already been discovered in the way of factual data and what has been proposed in the way of theoretical explanation. In the vast majority of all cases, he will find that his problem has already been adequately dealt with by someone else. This fact often tends to be overlooked largely because the cases in which the investigator cannot solve his problem by reading about someone else’s work are of such great importance. Absorbing someone else’s ideas does not contribute to the advance of science in the same way as producing new ideas.

Nevertheless, the advance of science has as its objective the steady increase in the number of problems which can be solved by the simple expedient of consulting the literature. Each new discovery by an investigator is one more bit of data or theory which will not have to be discovered again. The steady growth of our knowledge permits the economizing of time of investigators. Instead of personally rediscovering various things, they consult the previous work of others and then use the time so saved to investigate new problems. Thus, the medieval scholars who devoted so much trouble to the rediscovery of Aristotle and the other Greek scientists and to the correction and multiplication of their texts were performing a real service to human knowledge. Finding out what the Greeks had discovered was, in fact, the proper first step for any investigator in the thirteenth century, and maximization of returns from a given amount of effort required that the spreading of the recently recovered Greek knowledge be given higher priority than new discoveries. It is, of course, true that once the Greeks had been recovered, there was somewhat too much reverence for them, with a consequent de-emphasis on new work, but even here the time gap between the substantial completion of the work of recovering the Greek discoveries and disseminating them throughout the learned community and the work of the men who added to them is not very long.

If the social prestige of discovering something which you yourself did not know but someone else did is less than that of genuinely original work, such “research” is still of great importance. Unfortunately, it is not as easy as might be hoped.25 The investigator interested in a certain problem must first find the previous work on it. He obviously cannot simply go through the whole of human knowledge in hope of finding something, since the total is much too vast for any one mind. This problem has resulted in the presently burgeoning interest in “information retrieval.” To date, the research in this growing field, far from solving the problems for other fields, has itself developed into a field too large for an investigator to follow thoroughly. Still, the effect of this research can hardly avoid eventually making the “search of the literature” easier. The investigator normally cannot even hope to cover the whole of any reasonably wide field, since the advance of human knowledge has caused even quite narrow classifications to contain more information than one man could possibly absorb. Further, he is normally even more limited. He does not intend to devote his whole life, but only a small period of time, certainly not more than would be necessary to discover the same facts by direct investigation, to the search. The whole point, in fact, of having this vast body of knowledge available is as a sort of labor-saving device, an assurance that resources will not be wasted in rediscovery. The greater the speed with which a given investigator can find what is already known about a problem and the greater the security he can feel that he has actually found everything, the better the system works. Unfortunately, there are inherent limits on the efficiency which can be expected.26

These limitations depend on two facts: the obviously limited amount of resources to be invested in improved filing and crossfiling of data (it would be wise to increase the present level) and the impossibility of predicting accurately the information which will be wanted in the future. Thus it is impossible to group information under the heads which would be of maximum utility for as yet unpredictable research. Turning once again to our graphic representation of knowledge, let us suppose that all knowledge has been classified by two crossing systems, each of which has general categories, subcategories, sub-subcategories, etc.

lf1279-03_figure_009

Suppose a given investigator becomes interested in learning about the information in the area enclosed by the oval. Note that it does not fit any of the classifications, although both classification systems recognize the close relationship of the facts concerned by putting them close together.27 None of the subcategories in either classification system would give him the information that he wants. Either the general category B or the two series would give him what he wants (there would be other areas of research, the circle, for example, where even this would not be true), but covering either field would involve a vast amount of wasted reading, only a tiny part of which would bear on his problem.28 In actual practice, the amount of information absorption which would be necessary to cover the whole of human knowledge on the smallest conventional classification which contains the area under investigation would normally be quite a major project.

Sometimes, particularly in applied research, this problem can be solved by teamwork in which each of several men becomes expert in one phase of knowledge. The limitations on this procedure arise from the fact that not all research can be divided into independent segments. Quite frequently success requires that all fields be integrated in one mind. Consider, for example, a team designing a jet plane, and let us suppose that it is broken into three divisions: engine, electronics, and airframe. A change in engine design which reduced the efficiency of the engine might be desirable on general grounds because it permitted superior airframe design. Obviously, such a change would never even be considered unless the engine designers knew enough about airframes or the airframe men knew enough about engines to recognize the interrelationship between the two problems. Thus if the personnel were genuinely specialized, they would not produce the optimum design. On the other hand, possibly the improvement in airframe design is so subtle that only a man who has devoted his whole life to study of airframes could perceive it. In this case, the problem is insoluble.

If the teamwork solution is not possible, and in a very large number of cases it is not, then we must depend on the filing system to produce the necessary knowledge. The problem, to repeat, is not to get the information needed out of the library (which is easy), but to exclude the unnecessary information so that the investigator will have a manageable job of self-education. The system in use proceeds in two steps: first, a great deal of information is excluded from the classification system as a whole, and second, the classifica-tion system tries to so order the included information that the investigator may find what he needs while excluding what he does not need.

The two problems are interrelated in that the more efficient the classification of the included data, the less data must be excluded. Speaking analogically, if the average investigator is capable of mastering a thousand “bits” of information on a given problem, and the classification system divides knowledge into ten thousand parts, then the total amount of knowledge which could be included in the system could not be much above ten million if it was to be efficiently used. If the classification system could unambiguously distinguish a million categories, then the system could operate on a billion bits of information. This, of course, assumes that the exclusion process, on the whole, excludes the less important bits of information, although this is a rather heroic assumption.

The initial exclusion process operates essentially at the publication level. It would obviously be desirable, if the classification problem did not exist, to have every bit of information available to the human race permanently recorded and available to investigators. Every experimental result and every document should be available for future consultation. If, however, any effort was made to accumulate this vast mass of information, researchers would be confronted with unmanageable masses of data when they investigated even the simplest problem. It would not, of course, be difficult to construct a classification system which broke any given quantity of data down into parts of any given size, but setting up such a classification system on a non-arbitrary basis has so far been impossible. Perhaps the very active research now going on will shortly produce improved techniques. But classifications chosen must be such as to have at least a reasonable chance of assisting future researchers by presenting information in categories which will be useful for as yet undreamed-of investigations. It must also be understandable to investigators in the sense that they will not have any great difficulty in learning to use it to find data. All existing systems which meet these requirements, and probably all systems to be invented in the future, have strictly limited abilities to discriminate knowledge into classes, hence the necessity of excluding some information from the catalogues.

If information is to be initially excluded and thus made permanently unavailable to future investigators, then it is obviously desirable that the most important information be included and the least important excluded. This, however, requires prophecy, since “important” means “important to future research.” That errors will be made is obvious; we can only hope that they are minor. The system now in use simply depends on the individual judgment of various editors. If a given work of research is thought unworthy of publication by all of the editors (including editors of monograph series, etc.) to whom it is submitted, then it is excluded from the information which will be classified. True, the investigator, if he thinks enough of his work, can pay to have it printed, but it is unlikely to be included in any standard classification system.

The classification systems utilize a further stage of exclusion. In the first place, there are a great many different ones operating on different principles. At the lowest level the periodicals publish cumulative indexes of their own contents, and libraries keep card catalogues of their holdings. At the other extreme, there are a great many special indexes published covering various classifications of knowledge. Even these special indexes, however, make no particular effort to be catholic in their coverage. Normally, they consciously limit their coverage to magazines and new books which they think are of a certain level of importance, rather than simply indexing everything in their field. Another form of indexing material is represented by such magazines as Physical Abstracts, which presents brief abstracts of what its editors think are the most important publications in its field.

The object of all of this reference guide material, of course, is to make it possible for a scientist pursuing some future research project to find quickly and easily everything already discovered on the subject without having to read any significant quantity of irrelevant material. In view of the impossibility of predicting the course of future research, this objective cannot be exactly reached, or even very closely approximated, but we can at least make an effort. The actual system used is only partially based on efforts to predict the future. History and the structure of our language are both more important to most existing classification systems than conscious efforts to guess what will be needed in the future.

The historical development of science is ever present in our methods of classifying knowledge. Anthropology, to take an extreme example, covers two completely distinct fields joined only by the historic accident of some early investigators who happened to be interested in both. The same type of thing will be found throughout science. In addition, with the progress of science, connections not previously known are discovered, and previous connections are dissolved. The definitions of fields of knowledge tend to be determined by the exact time in this process in which the term hardened. As a consequence, fields of knowledge tend to be rather arbitrarily defined. These fields, however, are used as the basis for much indexing of knowledge, which gives the indexing a similar arbitrary slant.

The language of science contains similar arbitrary elements, in spite of the committees for standardizing and rationalizing usage. Most scientific terms were invented some time ago, and thus some of them cover fields of information which, to our present knowledge, seem somewhat arbitrarily demarcated. This is of little importance in the actual work of research, since the precise meaning of a word in the particular context used will normally be clear, but it does mean that the words are less than ideal as classification media for finding data. This is particularly so since it is clear that the growth of knowledge in the future will make even the best usage of today seem arbitrary.

Nevertheless, the use of historically developed subject categories and scientific terms as the basic system for classifying knowledge is unavoidable. Even if we could think of some other system, it would be less useful than the present one, since the scientist must know where to look in any system. If he knows the traditional fields and the normal meanings of words, and it must be presumed that any scientist who is likely to make advances in human knowledge will be already well acquainted with them, then he is equipped to use a system based on them. Development of another system would mean that the scientist would have to learn that system, as well as learn the subject covered. Since the classification system could not be simple and brief unless the subjects covered were also simple and brief, this would impose a major burden on him.

One highly valuable type of classification system should be mentioned. The great unifying theories which the various sciences seek are, among other things, classification systems which order knowledge in their fields in a regular way. A scientist who must master the general theory for his regular work will find that he has also mastered the system on which knowledge in the field is classified and will thus find it fairly easy to find data. Unfortunately, grand theories are all eventually disproved (the ones discovered very recently, such as the special theory of relativity, have escaped this fate, but no one has much confidence that they will last forever); and we can, therefore, deduce that some information lies outside the classification system of such theories. Only the “final theory” which presented the whole universe in one grand equation would be a really perfect classification system.

The manifest defects of the existing classification systems, even for a body of knowledge which has been deliberately pruned, are met in part by the cross-indexing system. Each item of knowledge, ideally, should appear under a number of different heads so that it can be found by searchers using different methods. The heads themselves should be selected so as to group facts and theories together in clusters which have something in common. Generally speaking, the more cross-indexing, the better, but here again we come up against mathematical problems. With a given body of data, the physical size of the index will be directly proportional to the number of different heads under which the average item is catalogued. This principle will apply regardless of the fineness or roughness of the principles of classification used.

The physical bulk of the total index is probably of little importance, at least until any given index becomes much more complicated than any present index, but the extent of the cross-indexing also affects the number of items under each heading and thus reduces the exclusionary effectiveness of the indexing system. If the average item appears under ten separate headings, and the total number of headings remains the same, then there will be twice as many entries under each head than if each item appears only under five headings. A scientist searching for an item of information under a given heading would have to plow through twice as much irrelevant information. On the other hand, he would probably be more likely to find the item under the first heading he tried.

The obvious solution to this problem is to use finer classifications, with the result that there are more headings. Further, a cross-classification system may, analytically, simply add a whole new list of headings, thus avoiding the whole problem. In practice the total number of headings is limited by the financial resources of the indexing organization. The more numerous the classifications, the more skilled, and hence the more expensive the indexers. The limitation on fineness of classification is also largely financial. The finer the classification, the larger the number of total entries and the more skill required on the part of the personnel doing the classifying. The increase in the amount of cross-indexing is thus partially dependent on increased financial resources and partially dependent on finer classification procedures which themselves are largely dependent on greater financial support.

The improvement of the indexing of present knowledge is thus to a considerable extent a matter of increased financial support. Because this type of work lacks the glamour and interest of new discoveries, it has tended to attract less in the way of money and a good deal less in the way of talent than direct scientific research. It is, however, highly important and readily susceptible to organized improvement. It has always been doubtful if large organizations, like the government or the Ford Foundation, are really capable of advancing science very much. Discoveries are so much a matter of accident and/or personal inspiration that such large organizations can do little more than provide incentives and opportunities to individuals or, occasionally, small groups. The long sad record of Alexandrian science contrasted with the short brilliant record of pre-Alexandrian Greek science is often pointed out. The large-scale support of science available in Alexandria drew almost all of the best Greek minds there, and the central organization then stifled them in an atmosphere of cataloguing and minor advances. The possibility of a repetition of the experiment on a much larger scale should give any well-wisher to the human race nightmares.

Cataloguing, however, is an important part of science and, as the Alexandrian experiment illustrates, is a feasible objective for organized scientific activity. It requires organization and fairly large funds, but it does not require much independence or inspiration. It is therefore an ideal area for large-scale projects; large organizations trying to advance science can probably do their best work there. Improvements in filing and cataloguing would have, at least, a proportional effect on the growth of science and might well have a more than proportional effect. It should also be noted that improved classification procedures would permit the lowering of the present “threshold” of merit so that the total volume of information kept in our collective “memory” would also be increased.

As a sample project, the Linnaean system is obsolete. The development of biological knowledge since Linné’s day has been so great as almost to overwhelm his basic system. A haphazard system of patching and extension has been used to add to his classifications, but the end result is both aesthetically ugly and practically inconvenient. The development of a whole new system, based on Linné in the same way that his work was based on Aristotle, would obviously be a major step forward and one which would require no new knowledge. It would be very expensive, of course, but where could the Ford Foundation better invest $20,000,000?

The recent tendency to turn to computers to solve all problems has also been seen in this field. In fact, the problem is not of the sort that present computers are adapted to solve. There would be no particular difficulty in designing a computer to go through a set of files and select out all items under a given head. It could even be attached to an automatic library which delivered the required volumes. Devices of this sort, albeit of limited capacity, actually exist. This is not, however, the basic problem which concerns the original headings and making of the index. This, under present circumstances, can be done only by human beings and will require as many of them if a computer is used in the later stages as if it is not.29 The possibility of using computers several orders of magnitude larger than any now contemplated to “search” the whole body of knowledge for specified information does exist, but is not for the immediate future.30

The search of the literature will be continued by the investigator until one of three things happens: he grows tired of the particular project, he finds a testable hypothesis (the process of “finding” such a hypothesis, given adequate factual knowledge, is the subject of the next chapter), or he exhausts the recorded information on the field and turns to direct investigation. Little can be said about the methods of direct investigation except that the mind of man is almost infinitely ingenious. The number of apparently insoluble problems which have been solved is amazing.

The investigation of reality also proceeds until the investigator grows tired of it or a hypothesis is achieved. Since there is no further step available, these are the only two alternatives. The fact that an investigator grows tired of his project without obtaining any hypothesis does not prove that the project was fruitless. He may have discovered enough simple factual information to justify his work. This factual information may later, either by itself or combined with other discoveries, lead to an important hypothesis. Not infrequently, in fact, the whole purpose of the investigation was simply the accumulation of data. The recent major investigations in the Antarctic, for example, were largely aimed at the accumulation of geographic data. Exploration of new territory has always been basically concerned with simple data accumulation.31

Similar motives are not infrequently behind laboratory experiments. The development of more accurate tables of physical constants is a continuous preoccupation of scientists, and the development of a table of measurements of almost any new phenomenon is normally considered a quite respectable research project. It is, of course, quite possible that this new data will lead to a hypothesis, but developing the data would be considered worthwhile even if this were impossible. Scientists are curious about, among other things, exact magnitudes; the practical usefulness of tables of measurements is obvious. The development of information on such matters, therefore, is legitimate even though it leads to no hypothesis. In such cases, the research is continued until the investigator gets tired of the subject.

Sometimes research is aimed at simple data accumulation in fields other than those in which exact measurements are likely to result.32 Geographical exploration, already mentioned, is an example, but a good deal of chemical research (especially in the nineteenth century) was concerned with mixing some things and seeing what resulted. In the applied field there is still a great deal of this sort of thing. Other illustrations can be found in metallurgy and parts of astronomy. Personally, I feel that science aimed at hypothesis and grand theories is of a higher order than simple data accumulation, but data accumulation has its place. Here, again, is an area where large organizations with sizable appropriations can operate successfully. The library at Alexandria did a good deal of this sort of work, and today it is done on a large scale by state-operated laboratories in various parts of the world.

So far I have discussed the problem of data accumulation in terms which might suggest that such research projects normally lead to a predetermined result (i.e., the achievement of a hypothesis or the attainment of a set of desired measurements) or lead nowhere. The problem is not that simple. Many scientific discoveries are accidental. A researcher is accumulating data with the objective of solving problem A, when suddenly he sees that the data are pointing to a solution to problem B. Such accidents are of the very greatest importance to the development of science and are one of the major reasons for not trying to predict its growth.

It should, of course, be realized that such an accident depends greatly on the alertness and intelligence of the investigator. He must recognize the importance of his new data for a problem other than the one he is investigating and must realize that his other problem is important. To say that a given discovery is the result of accident, then, is not to cast doubt on the ability of the investigator making it. It may well involve, as in the legendary case of Newton, the very highest scientific talents, but it is still in a large part the result of chance. It is probable, however, that most scientists have many opportunities to make such chance discoveries. Without recognizing the possible out-come, they undertake experiments which lead to results of great importance to fields other than the one they are investigating. The rare investigator who seizes on such an opportunity deserves as much credit as if he had originally aimed at the result he eventually obtained.

[1. ]They may wait a long time. One of the members of the founding congress of the Chinese Communist party wrote a master’s thesis on the early development of the party for Columbia in 1924. Deposited in the Columbia University Library and forgotten, it was not rediscovered until 1960. New York Times, October 30, 1960, p. 13.

[2. ]Not as small as one might expect, however. John E. Pfeiffer, “How the Mysterious ‘Memory Traces’ Outperform Microfilm,” National Observer, May 13, 1963, p. 20.

[3. ]In some cases the work leading to the doctoral dissertation should be classified as self-education, but in many, it should, I think, be listed as part of formal education. Graduate students now frequently undertake dissertation projects not as the result of more or less unguided choice, but by negotiation with various organizations which have funds for research. These negotiations frequently control the subject and general treatment of the research.

[4. ]Normally, the most compact form in which data can be presented is a theory covering it. Thus the view that theories are merely convenient ways of writing down our observations has this to be said for it: it is clear that they do serve this purpose, among others.

[5. ]The terms “basic” and “crossing” are arbitrary. There is no reason to believe that one is more basic than the other.

[6. ]Physics almost reached it in the last part of the nineteenth century.

[7. ]The same improvement might be made in engineering training, but the engineering schools approach the ideal in this respect much more closely than the scientific ones.

[8. ]Michael Faraday, one of the greatest scientists of the nineteenth century, was an extreme example. The son of a blacksmith, he was apprenticed to a bookbinder and read the books sent to be bound. L. Pearce Williams, Michael Faraday (London: Chapman and Hall, 1965).

[9. ]This is less true of the scientists motivated by induced curiosity than of those motivated by practical considerations or plain curiosity. The “induced” investigator may, in fact, simply read everything he can in some narrow field defined in strictly formal terms. This is a rather good test to find out the real motives of a university-employed pure scientist. Those who stick very closely to some formally defined field of study will normally be “induced.”

[10. ]Possibly I overemphasized the importance of these non-content aspects of a scientific training because my own training was in the law. The most important things a lawyer needs to know are not taught in law school, and most of the things taught in law school are of little use to a practicing lawyer. See Jerome Frank, Courts on Trial (Princeton: Princeton University Press, 1949), particularly chap. xvi, pp. 225–46. The principal reasons for law school graduates’ going into law, then, are those I have listed which have no connection with the content of the courses. While these extracontent aspects of the educational system are undoubtedly important in other fields, possibly my own experience leads me to overemphasize them.

[11. ]Some scientists have been experimenting with a radically different method in which the distribution of “reprints” is the major factor. See Seymour S. Cohen’s letter, “Reprints Again,” in Science, 148 (May 28, 1965), 1173.

[12. ]Science, 143 (January 24, 1964), 308–9.

[13. ]Some very primitive calculations I have made indicate that the indirect monetary gain made by economists from publishing is of the order of $2,000.00 per article.

[14. ]John R. Baker, “Freedom and Authority in Scientific Publication,” in Science and Freedom (London: Secker & Warburg, 1955), pp. 58–68.

[15. ]In reality the role of editors may be less than I have indicated. A good deal of the work of selecting articles may be delegated to others. In this event, the “other” or “others” who make the selection play the role of editor in my discussion. This whole subject will be discussed in the final chapter.

[16. ]Lest I be suspected of putting too much emphasis on cross-disciplinary research, I should like to mention that another great advance in the recent history of astronomy was the invention of the Schmidt telescope. This was the result of a lifelong devotion by Mr. Schmidt to the extraordinarily narrow specialty of telescope optics. I do not quarrel with the present organization of science with the bulk of the workers engaged in narrowly defined specialties; I merely suggest that the concentration in traditional fields is higher than optimal.

[17. ]Scientists, of course, normally have no more knowledge than this of the fields in which they have made no special study.

[18. ]If history is to be counted a science, then it constitutes an exception to this rule. A great deal of new work in history appears first in book form.

[19. ]The problem is much less severe in Europe, where many monographs or booklets circulate, but even there it is unlikely that a proportional number of monographs are produced.

[20. ]For a similar complaint with the suggestion that journals solve the problem by making space for very lengthy articles, see Peter Gruenwald’s letter, “Too Much of Too Little,” Science, 148 (June 11, 1965), 1412.

[21. ]For a discussion of the difficulties of writing a good review, see George Sarton, “Notes on the Reviewing of Learned Books,” Science, 131 (April 22, 1960), 1182–87.

[22. ]As a personal experience, The Calculus of Consent (Ann Arbor: University of Michigan Press, 1962), by Dr. James Buchanan and myself, was reviewed by four of the five principal economic journals; the fifth does not publish reviews. It would be a rare economist who was not exposed to at least two of these reviews.

[23. ]Unfortunately some of the “standard” books contain enough errors so that they may actually retard the growth of science. M. King Hubbert, “Are We Retrogressing in Science?” Science, 139 (March 8, 1963), 884–90.

[24. ]Letter by E. H. Ahrens, Jr., “Conference Literature,” Science, 148 (April 16, 1965), 313. For a contrary view see John H. Schneider’s “Conference Literature: Rebuttal” in the June 18 issue (148: 1542).

[25. ]Phyliss Allen Richmond, “What Are We Looking For?” Science, 139 (February 22, 1963), 737–39.

[26. ]The problem of “data retrieval” is now the subject of a specialized journal, Information Storage and Retrieval. The issue of Science for May 8, 1964 (144: 581ff.) contains articles by Richard See, Gerard Salton, John C. Green, and Eugene Garfield which will serve as an introduction to this growing field.

[27. ]The type of problem where a man is interested in information in fields widely separated in the existing classification system can be considered as a series of separate problems of the sort we are now discussing.

[28. ]Our diagram is an analogy, and like all analogies, not exactly congruent with reality. It might appear that the simple solution would be to examine everything contained in the classification B which is also filed under the general field 2, but this would work only on our diagram. Seldom, if ever, would this be possible in reality.

[29. ]Computers have been programmed to perform various routine tasks which would normally have to be done by the indexer. This permits the human “parts” of the system to work more efficiently but does not solve the basic problem. See L. Karel, C. J. Austin, and M. M. Cummings, “Computerized Bibliographic Services for Bio-medicine,” Science, 148 (May 7, 1965), 766–72, for an example of such a system.

[30. ]Even with computers of the desired size, the problem of specifying the information desired in terms which would permit the computer to recognize desired items and reject undesired items would be an extremely difficult one. Presumably the answer would be sought along the lines of searching for certain combinations of words and phrases, but this raises almost as many difficulties as it solves. Again, a great deal of research in this area is presently being undertaken.

[31. ]This is something of an oversimplification. While the hypothesis “If I look on the other side of the hill, I will find something” has always been basic to geographic exploration, not infrequently some more specific and testable hypothesis has been an important motivating factor. Hypotheses about the sources of rivers seem to have been particularly fruitful.

[32. ]For an amusing discussion of the danger of simple data accumulation, see the letter by Bernard K. Forscher, “Chaos in the Brickyard,” Science, 142 (October 18, 1963), 339.