Articles

Published in Correlation, a referred journal of research in astrology, Volume 26 (1) 2008.

A Comprehensive Review of the Carlson Astrology Experiments

 Joseph E. Vidmar, Ed.D.

Abstract. An experiment in astrology published in Nature in 1985 is reviewed using the original materials given to the astrologers, interviewing four surviving astrologers that were found, and comparing the claims of the experimenter made before, during, and after the experiment. Numerous errors were found with experimental hypothesis, design, use of psychological tests by non-psychologists, data collection, data reporting, significant bias, misuse of statistical procedures, unsubstantiated claims, and presentation of a predetermined conclusion disconnected from results.
Introduction
   This experiment began May 1980 and continued until December 1981.The article in Nature appeared December 5, 1985 and was immediately reported in many forms of media in at least the U.K, U.S.A. and Canada. On the same day, both ‘The San Francisco Chronicle’ and ‘The Times of London’ ran articles and also, on the same day, Carlson was interviewed by a CBC Canadian radio show called “As It Happens”.  How many other newspapers across two continents ran articles about this study is not known, but it stung the astrological community into accusing Carlson of creating a ‘media circus’. Over the years, Carlson’s claims that ‘astrology failed’ and ‘there is no scientific evidence for astrology’ have been heard or read by over an estimated 5 million people.

Summary of the Carlson experiments        

     An article was published in Nature, arguably the most prestigious and influential journal in the world of science, December 5, 1985 by Shawn Carlson. Its title was ‘A double-blind test of astrology’ and its conclusion was “It failed”.
     The researcher proposed to measure the ‘astrological thesis’ – that planets at birth can be used by astrologers to predict future personality - by a double blind experiment he characterized as having two complementary parts. Part 1 measured: (a) how well 83 test subjects could choose their correct 6-page astrological interpretation out of three; and (b) this was compared to how well 56 test subjects could choose their correct psychological profile (CPI) out of three. Part 2 measured how well ‘less-than-28’ astrologers could match the subject’s horoscope to the correct psychological profile out of a choice of three. A total of 116 CPI-horoscope matchings were made by the astrologers. In addition, the confidence levels at which astrologers and subjects made their judgments were measured on a scale of 1-10.
     Levels of significance were set at 2.5 standard deviations according to standards used in physics. Results were null in all measurements except: (1) the control subjects could choose their astrological interpretation better than test subjects; (2) test subjects could not select correct astrological interpretation any better than they could choose correct psychological profile; (3) astrologers could not choose correct CPI better than chance; and (4) astrologers’ confidence level was not consistent with actual performance. The researcher concluded that measurements from Part 1 could not be used because the subjects could choose neither the astrological interpretation nor the psychological profile and that they did not have the self-recognition required to do this test. The subjects’ confidence levels were measured, but not reported.
     The researcher’s Conclusion was that Part 2 indicated astrology failed for two reasons: (1) astrologers were unable to choose the same correct psychological profile the subjects had not been able to choose; and (2) the astrologers’ confidence level was not consistent with actual performance. The researcher concludes that despite the astrologers being some of the best in the country with the ability to use the CPI and having approved the design, “the astrologers’ predictions proved to be wrong” and “the experiment clearly refutes the astrological hypothesis”.   
 Review of published reactions to Carlson’s study
The Carlson 1985 article has been read by maybe 60,000 physical scientists over the years and not one physical scientist has published complaint. The article was immediately criticized the next month, January 1986, by Professor Hans Eysenck, called by Dean and Nias (1997) the most influential living psychologist at that time and still the 3rd most cited author in the history of psychology (after Freud and Piaget). When he died in 1997, at the age of 81, he had written some 1000 articles and nearly 80 books. His strongest contribution to psychology, among many fields, was the measurement of personality.
     In addition, Eysenck was among few psychologists who had ever researched in the field of astrology and had published a book with David Nias (1982) reviewing all astrological research that could be found.
     Eysenck wrote two reviews of the Carlson article in 1986. In ‘Astro-Psychological Problems’, Eysenck says, “The conclusion does not follow from the data”.  In reference to the psychological test used in the experiment, he says, “The scales of the CPI are essentially arbitrary and subjective. They were not chosen by suitable statistical techniques, such as factor analysis. Similarly, the terms used to describe each scale … creates difficulties for anyone not thoroughly familiar with, and trained in the use of these scales. In the manual of the test, users are warned that ‘it is important that scores on this test be interpreted by a competent psychologist”.
     Eysenck reminds the reader that ‘testing astrology is a complex and difficult field, as indeed all fields relating to psychological variables’. He noted the absence of any psychologists as advisors, and said “Carlson selected a psychological test, which proved to be unsuitable, and which any competent psychologist could have predicted would prove so”.
     In ‘Correlations’ (1986), Eysenck again noted the absence of psychologists, the unsuitability of the CPI, participants not being trained in the CPI, how the test profile is not self explanatory, and the statement “the notion that untrained astrologers would be able to interpret test scores is contrary to the claims made in the test manual itself”. Then Eysenck notes the implications of a larger meaning, “Indeed, the whole way the experiment was designed…..indicates clearly that both sides regarded psychological expertise as a negligible quantity, and felt anybody can do as well as a trained psychologist in choosing, interpreting and evaluating results of the application of personality inventories. This may be a widespread belief but it is an erroneous one.”                                                    
     Eysenck concludes this article by saying when he and David Nias did the research for their 1982 book on astrology, they came across “study after study where the whole experiment had to be faulted because of quite elementary errors in the choice or interpretation of psychological measuring instruments, errors which would be obvious to a first-year student of psychology”.
   Eysenck’s articles were published in scholarly and scientific journals, but both with small circulation. They were not written about in the media, not even in other scientific literature. As late as February 2005, Carlson was quoted by a website called ‘Skeptical Studies’ as saying, “I have not yet received a serious scientific challenge to the paper”.  In response to criticisms by astrologers in an AFAN Newsletter (1986), he says, “its few substantive criticisms are attributable to ignorance of his experiment, of the CPI, and of basic scientific methodology”.    
     Teresa Hamilton Weed (1986), an M.A. psychologist who was a participating astrologer in the experiment, wrote an article in the same issue of Astro-Psychological Problems as Eysenck did and he called it scholarly. Noting that Carlson “has gained a sort of instant fame…he is in demand for radio and television talk shows”, she stated that “Mr. Carlson followed none of my suggestions. I was never satisfied the experiment was a fair test of astrology and outlined my objections in a letter to Mr. Carlson dated August 19, 1981”. She questioned the composition of the control group, the fact astrologers were not told whether subjects were male or female, the lack of qualifications to use the CPI, the limitations and complexity of the CPI, and the lack of dissimilarity of CPI profiles. Being trained in the CPI and being the most qualified astrologer in the experiment, she said, “I considered the task virtually impossible”.
     An astrologer-scholar, Geoffrey Cornelius (1986), wrote about the experiment in an astrology journal and called it ”an exemplary demonstration of bad science, and a warning to astrologers of the penalties of naivety”. He writes again in his book (2003) some 17 years later that the sample was unrepresentative, the assumption that astrologers are competent to match the results of a personality inventory to the horoscope is faulty, and the difference of horoscopic interpretation between looking at a human being versus the chart of a sexless random code number. He is the first person to note that the control group selected the correct chart significantly better than the test group and that “hidden in Carlson’s figures and conveniently not reproduced for publication is a devastating figure” of 2.9 SD. His conclusion was that the control group had broken down as a control.
     Mark Urban-Lurain (1984), computer scientist and director of instructional technology at Michigan State University, who had written a book on using statistics and multivariate analysis in astrological research, analyzed the Carlson experiment (N.D.) and concluded the use of the CPI was invalid, which made the outcome of the experiment invalid. He said the primary problem of the study was a physicist using psychological tools.
     Carlson continues to be cited in the skeptic community as having “irrefutably proved astrology doesn’t work” and as recently as 2007 was praised in ‘Correspondence” by Professor Lower in the same prestigious journal, Nature.
    The most recent scientific critique of the Carlson experiment (2008) was written by Suitbert Ertel, professor of psychology at Gottingen University, who, at this time, is generally acknowledged as one of the most prominent living psychologists with expertise in astrological research. He made strong criticisms of the experiment regarding its data generation, its methods of analyzing the data, and its neglect of test power. Selection of 3-choice format was flawed, analysis was done in a piecemeal fashion causing total effects to be disregarded, standard deviation procedures were used in two different ways with two different meanings, the logic of control group designs was ignored, tests were used that did not consider effect size, failure to explain how significance values were obtained, and using too small a sample. He indicates that Carlson’s procedures would miss 80% of significant results at p=.02 level.
     By re-analyzing Carlson’s data with correct statistical procedures, Ertel found the rather startling results that the astrologers had actually ‘won’ in study 1 with at least a probability of p=.05 and an effect size of .15 (3X the size found in the Gauquelin data). If the control/test data was mixed up in study 2, the combined result for study 1 and study 2 taken together would be very significant at p=.01. Carlson’s experiment actually ‘proved’ the astrological hypothesis.
Evaluation of Carlson’s study
1. Clarity of hypotheses
         Any scientific experiment begins with the experimental hypothesis. Actually, Carlson never gives an explicit experimental hypothesis to this experiment.On the first page, he talks about the ‘fundamental thesis of natal astrology’, but that is the thesis and is not an experimental hypothesis. In the second sentence later he says, “We took great care to eliminate all biases which could tend to ‘randomize’ the results and thus favour the scientific hypothesis over the astrological one”. He has subtly presumed, but not made explicit, the ‘scientific’ hypothesis. He has also subtly made the contestable   presupposition that an astrological hypothesis is not or cannot be scientific. There is a paradox here. If it is not a scientific hypothesis, then it cannot be tested scientifically.
Carlson says that he is testing the astrological hypothesis, but that does not happen. He does not test astrology and he does not test astrologers at what they do. He uses an invalid assessment method to test the self-recognition ability of the subjects and concludes that they have none. He tests the ability of astrologers to choose a psychological test profile and concludes that astrology doesn’t work, not the invalid test that he used. He measures the confidence of astrologers to guess how well they will do on an unknown task and concludes astrology doesn’t work, not that astrologers are not any better at guessing than anyone else.

2. Clarity and organization of the article              
     This article by Carlson is an extraordinarily hard paper to read and follow.     Problems included: (1) headings not following standard format; (2) material under one heading belonging in another; (3) lengthy paragraphs with as many as 21 sentences and 8 different themes of thought; (4) incomplete information found scattered over several pages or not found at all; and (5) mixing methods, results and discussion of the two experiments in the same paragraph as if they are one subject.
     Even after reading this article again and again with the most critical eye, this reviewer would have to go back and forth from page to page, paragraph to paragraph, to follow a line of thought. In personal correspondence with Geoffrey Dean (October 31, 2007), this article’s main scientific supporter, he says, “Reports of the Shawn Carlson study almost always get the details wrong and thus promote confusion.  I met Carlson at the time of his study …. and had an extended correspondence afterwards to correct the confusion”. But even Dean got a major detail wrong (actual number of participating astrologers) and has thus promoted confusion. Confusion is not due to the reader, but is due to the lack of structure and disorganization of the paper. It would have most likely been rejected by all refereed journals in scientific psychology, at any rate by The Southern Psychologist that this reviewer co-edited with Ralph Mason Dreger.
     How it came to be published in ‘Nature’, possibly the most prestigious scientific journal in the world, is an enigma that will be addressed in its own category toward the end of this paper.
3. Efforts devoted to the present evaluation of Carlson’s study
     Being retired, this reviewer had the luxury of time to invest some hundreds of hours for research into a 7-page article that was based on an experiment begun 28 years ago. He was able to locate 4 surviving astrologers who participated in the experiment. Two remembered nothing, one had some memory and copies of two article of criticism she had written in 1986, and one had a good memory of events and had kept the original materials of the experiment. That an astrologer, Erin Sullivan, would keep the raw data of her participation 27 years ago (when the experimenter did not) is remarkable. The papers were finally located amidst a mess of other storage at the bottom of a basement in London, England with the help of astrologer-scholar Geoffrey Cornelius. Without this information, there are many things about this study that would never be known or understood. Among these many papers, the following were the most significant:

  1. A copy of advertisement in ‘Help Wanted’ section of Daily Californian (UC Berkeley) dated May 5, 1980: “ASTROLOGY EXPERIMENT - Need subjects for experiment to test Astrology. Takes only 30 min. For info. send SASE”
  2. Invitation letter to astrologers from Carlson to participate dated March 21, 1981
  3. 2-page ‘Information Sheet’ (about Interpretation of CPI)
  4. 6-page ‘Astrology Experiment Detailed Outline’
  5. 2-page ‘Format: Guidelines for Preparations of Interpretations’
  6. 1-page ‘Questionnaire’ (requiring 1-4 pages of response) that included request for permission to use their name to publish a list of astrologers who participated.
  7. Copy of Registered Letter dated May 15, 1981 from Sullivan to Carlson complaining about false progressed horoscope No.73115.
  8. Letter from Carlson dated May 26, 1981 explaining it was computer error, saying “You are only the second astrologer who has reported this error”.
  9. 28-page photocopy of CPI ‘Interpreter’s Syllabus’ mailed in a Lawrence Berkeley Laboratory envelope with a LBL postage stamp dated June 25, 1981.
  10. Reported letter of complaint from astrologer-psychologist Therese Weed 8-19-81.
  11. Letter from Carlson, undated (Oct.1981?) but signed, to all astrologers saying much of the data analysis was completed and encouraging them to complete the tasks by Dec.15
  12. A copy of Carlson’s original 30-page ms. submitted to Nature on Lawrence Berkeley Laboratory stationery and then compared to published Nature article.
  13. Copies of AFAN newsletter January 1986 with complaints about the Carlson study and Carlson’s 250-word rebuttal in AFAN newsletter (March 1986?).
  14. A copy of Carlson’s full 15-page ‘Rebuttal’ mailed January 21, 1987 to astrologer Jayj Jacobs “several months” after invited request.

 

4. Experimental design of the Carlson study
4.1  Main assessment used

    California Personality Inventory (CPI)

Qualifications to use the CPI
     The CPI is a proscribed psychological test which may not be purchased, used or interpreted without the “qualifications required of B-level users, plus hold an advanced degree in a profession that provides specialized training in the interpretation of psychological assessments”. License C is specified by the copyright holder, Consulting Psychologists Press (CPP) (2008), according to The Standards for Educational and Psychological Testing as published by APA, AERA, and NCME. To use this test without compliance with these standards is in violation of ‘Ethical Principles of Psychologists and Code of Conduct – 2002’, section 9.07, ‘Assessment by Unqualified Persons’ of the American Psychological Association (2002).
     Neither the experimenter, an undergraduate student in physics at the time, nor the astrologer subjects, nor the student subjects had the qualifications to use or interpret this restricted psychological test, particularly for the matching tasks which were required and had never been done before with any psychological test for any reason.  
     The question of where Carlson got these materials must be raised. He did not have the qualifications to purchase this test through CPP and they are not put on open library shelves because they are proscribed. To photocopy these materials is in violation of copyright of CPP. When asked this question, Carlson (personal communication, April 10, 2008), refused to answer while also adding he does not consider psychology to be a science.
     Difficulty of interpretation of the CPI 
     Professor Eysenck said, “In the manual of the test, users are warned that ‘it is important that scores on this test be interpreted by a competent psychologist’”. Harrison Gough, author of the CPI, says on the front page of the Interpreter’s Syllabus, “the diagnostic implications of the profile are not always self-evident …validity-in-use is not something that resides purely in the inventory itself”. The first paragraph of the ‘Information Sheet’ given to the astrologer subjects said, “Any individual scale of the CPI is not extremely accurate”.
     Because individual scales are not extremely accurate, the ‘Information Sheet’ says to “look for characteristics which seem to be evident for sure in several scales”. Gough (1968) writes in the ‘Interpreter’s Syllabus’ that “Diagnosis must rest on patterns and combinations just as much as on individual high and low points”.  In other words, 18 individual scales must be learned, but cannot be interpreted individually. Also, out of 18 scales, maybe only two or three are above T70 or below T30 and therefore interpretable.
     In addition, not only “changes in scores on one scale may alter the interpretive implications of scores on another scale”, but they need also to learn 324 2-scale combinations. Each 2-scale combination has 4 possible expressions. That is 324 X 4 = 1296 diagnostic combinations to consider. Each combination has 10 different adjectives or descriptors and so that makes 1296 X 10 = 12,960 different variables to consider for interpretation. Furthermore, those 12,960 variables are significantly different for men than they are for women. Therefore the number of variables to be considered (and understood by item analysis) is 12,960 + 12,960 = 25,920. Gough summarizes this section by stating, “the ability to translate a complete profile into an individuated analysis are skills that must be developed in practice – not just read about in a book.”
     There are four fatal flaws in Carlson’s administration of the CPI: (1) The first and most critical flaw is that he did not give the astrologer subjects the gender of the profile; every scale has a ‘male’ interpretation and a ‘female’ interpretation and they are different from each other; (2) Carlson gave the astrologers the wrong book; the ‘Interpreter’s Syllabus’ only contains 1 incomplete example of a 2-scale combination out of a total of 324; (3)  The astrologer subjects were given the entire 28-page ‘Interpreter’s Syllabus’ to make CPI interpretation with and the student subjects were given only a 2-page  ‘synopsis’ or ‘summary’ of what the individual scales meant; and (4) The CPI profiles given to the astrologers were, according to astrologer-psychologist Weed, not dissimilar from each other and looked too much the same to differentiate between.
   The above discussion invalidates the CPI in this experiment half a dozen different ways. It was used by untrained, unqualified persons in the wrong way using inappropriate  materials to make an invalid measurement of the unspecified hypothesis. Therefore, all of Part 2 and the part of Part1 that uses the CPI are invalidated.

   Chart interpretation
   Each participating astrologer was required to write a five-page interpretation of four coded charts with no demographic information nor birth data attached to any of them. Each interpretation was written according to a predetermined format “developed in collaboration with the advising astrologers” of five categories: (1) personality/ temperament; (2) relationships; (3) education; (4) career/goals; and (5) current situation.
     There are problems with these categories that any ‘expert astrologer’ should have known. The subjects were college students, presumably unmarried, and too young to know their pattern in relationships. They were also at the age of still deciding what their career/goals should be. Both career/goals and education are heavily influenced by sociocultural factors such as parents and opportunity and are not dependent upon personal inclination alone. One participating astrologer, Sullivan, made note of this when she wrote on the Questionnaire “education is a difficult thing to determine because the environmental and genetic factors are of extreme importance. However, one takes a chance and consults the usual 3rd/9th house axis”. These are factors that indicate mind and learning in a horoscope, but it is conjectural to assume they have much to do with ‘education’, a social institution the universal, compulsory form of which has only existed for the last 100 years. There is no ‘education’ planet or house in the horoscope.
     ‘Current situation’ is an appropriate category, but unfortunately some of the progressed horoscopes given to the astrologers were false because of ‘computer error’ and it is the progressed horoscope that answers that kind of question.
     In summary, four out of five of the categories for interpretation are questionable because of artifacts of sociocultural influence or invalidated by false information given to the astrologer. Therefore, all of Part 1 of this experiment is invalidated.     
    
4.2. Carlson’s claims made re: Experimental design

  1. ‘The two parts of the experiment are complementary’. There were two separate experiments, not two parts of one, and three different tests; results and methods of all three were sometimes collapsed into the same paragraph leading to confusion.
     
  2. The experiment ‘was designed with the help of astrologers’. The experimenter is talking about only the three advising astrologers, sometimes one, and none of them had the scientific expertise to ‘help’ with the design. The design of the experiment is solely the responsibility of the experimenter.
  3. After the experiment in 1988, ‘they all said the test was fair before the fact. Only after the results proved to be negative did they howl unfair’. Carlson received at least two letters, one registered, by astrologer participants making complaint about this experiment. Among the papers sent to this reviewer by two of the surviving astrologers is a photocopy of a registered letter sent by Sullivan May 14, 1981 raising questions about a false progressed horoscope given to her, alleging the whole experiment is invalid, objecting to Carlson’s language of ‘devious astrologer’, and concluding “I cannot rest easy taking part in an experiment I do not understand”.

In a letter to this reviewer January 18, 2008, she says “The 'experiment" should
      have been shut down immediately, because if it were intentional it would be a trick,    
      which one doesn't do in legitimate studies or experiments. So, I demanded they    
      cease and desist, and they didn't. They just ceased and desisted me”.

      In a letter to this reviewer November 2007, the other surviving astrologer, Terese  
 Weed said, “I do remember the extreme frustration when attempting to work on    
  this experiment” and in an article published April 1986, she says, “In fact, Mr.    
  Carlson followed none of my suggestions. I was never satisfied that the    
  experiment was a fair test of astrology, and I outlined my objections in a letter to   
  Mr. Carlson dated August 19, 1981”.

           In addition, Carlson (personal communication, April 9, 2008) said “several astrologers who helped to design the experiment have grossly misrepresented the truth in order to defend their participation in the project”. There were only three ‘advising astrologers’ who ‘helped to design the experiment’ and one of them was lying sick and dying from disease at the time of publication four years later. Presumably, he was too incapacitated to protest. Carlson specifically mentioned sending a copy of his article to this advising astrologer at that time.
     In summary, there is documentation that at least two participating astrologers sent letters of complaint to the experimenter and at least 2/3 of his ‘advising astrologers” protested the article upon publication. Carlson’s argument is that any astrologer who protested after publication did so because ‘they didn’t like the results’. There is no evidence for that claim. It is just as likely that the astrologers only then discovered that things they thought they had originally agreed to were not what actually happened.
 

  1. After the experiment in 1989, “I have not yet received a serious scientific challenge to the paper “.Professor Eysenck immediately challenged this experiment with two published papers the next month. Even if Carlson did not read the literature, Dean, who was an advisor to both Carlson and Eysenck at the same time, and who quoted the Eysenck review, would have mentioned it to Carlson in their post-publication correspondence. Also, Carlson would have read the summary that Dean (N.D.) wrote about his article and in that summary, Eysenck’s complaints were noted.
  1. That it is ‘Double-Blind’. Most counter indicative this is a double-blind is a photocopy of a letter sent by Carlson to astrologers approximately October 1981. It is signed, but undated. The capitalization and underlining are the experimenter’s:

 

Dear Erin Sullivan
We have completed much of the data analysis on the data which has been returned to us. We are verynear interpreting the results as FAVORING the astrological thesis„ Near, but not there yet. We simply must have the rest of the data. It is imperative that you carefully complete the CPI-Natal Chart matchings as soon as possible. We must have all the data back by December 15. Any data which we receive after this date will not be addable to the results.
It would indeed be a great shame to have done all this work, to have come so close to an unambiguously positive result and yet fail at the end due to insufficient data returned. So please, if you would like to help verify the astrological thesis, in the eyes of the academic community, complete the data envelopes and return them to me, with the natal charts, as soon as possible.


Shawn Carlson Bldg 50 - Room 232 Lawrence Berkeley Labs Berkeley, CA 94720

Sincerely,

          This contradicts what he says on page 422 of the Nature article:
 
            “At no time during the data collection did the experimenter have access to 
            any information relating subjects’ identities to code numbers. This control 
            was abandoned only when all the data had been collected and the methods of 
            analysis had been established. “ 
     
f.. ‘We took great care to eliminate all biases which could tend to  'randomize' the results and thus favour the scientific hypothesis’.  The bias in this experiment was not under control. In his 1987 Rebuttal to astrologers, he says he was testing their ‘paranormal abilities’. Most astrologers would not agree with this statement. Even Paul Kurtz, the Chairman of CSICOP, does not believe astrology is ‘paranormal’ in the ‘strict sense’.
         Bias was obvious to two of the surviving astrologers. Erin Sullivan wrote a letter in   1981 complaining about Carlson’s language of ‘devious astrologer’ in the Instructions   
he gave and refused to continue further. Therese Weed (1986) also wrote in Kosmos:

     “When I understood the strong bias against astrology inherent in this experiment, I   
refused to participate further. Perhaps other astrologers reached the same conclusion,  
for there was a sharp decline in participants after the initial stage of the project...”

       In his January 1987 Rebuttal letter to astrologers, he says, “I must also state for the  
   record that Mr. Lewis is half correct on this point. It is true that I was indeed biased.  
   But I wanted astrology to work!”

4.3 Subjects

      Test/control subjects.  Carlson writes in Nature, “We originally planned to be able to distinguish between the two hypotheses at a four standard deviation level…. Thus the total number of subjects was originally 256.”
     The experimenter says, “Many of these original subjects did not complete all phases of the experiment……In the end only 177 subjects (83 test group, 94 control) remained for Part 1 of the experiment”. Carlson neglects to put into the text how many subjects remain for Part 2 and the reader has to go to Table 1 at the bottom of the next page to find the missing information: 56 test group and 50 control group.
     In other words, he got a 69% return rate for Experiment 1 and 41% return rate for Experiment 2. He puts unfavorable data separate from each other so it is hard for the reader to connect the two together. He does this more than once.
     Carlson reports in his article, “Since much of the soliciting was done on campus, approximately 70 percent of the subjects were college students and about one half of these were graduates”. Again, no demographic information which is normally reported is reported. Age is not given, nor major of study, nor how many ‘disbelieved’ or ‘strongly disbelieved’, nor gender, nor employed or not, nor their motivation to participate in an experiment on astrology, nor how many were friends of his and fellow skeptics.
     Dean in his Summary says the average age of all subjects was 28. Giving average ages of 20 and 25 for the students, the average age for the 30% non-students would be 40.

  Astrologer subjects.  There is no demographic information given about these unknown astrologers. Although he several times mentions they are ‘respectable and held in high esteem’, they are unknown to the astrological community and even to each other. On the original questionnaire sent to the astrologers, they were asked for permission to print their names, but this list was never published. Basic demographic information that would be gathered by any social/behavioral scientist is missing. There is no information about their age, gender, years of study, years of practice, educational level, type of astrology practiced (tropical or sidereal), geographical location (Bay area or nationwide), or what standards were used to determine them as ‘expert astrologers’.
     In Dean’s website summary of the Carlson experiment(s), he divulges information given to him by Carlson that was not in the Nature article, “nearly all claimed to have some formal training in psychology, average 3 years, three were professional psychologists, and most claimed to have some experience with the California Psychological Inventory”.
     These statements are misleading. Psychology is an allied health profession and the requirements are rather more stringent than that of a ‘professional scientist’, which is what Carlson (Rebuttal) proclaimed himself to be at the age of 20 as an undergraduate student. To be a ‘professional psychologist’ in California is to have a license and to have a license is to: (1) have a doctorate from an accredited department of psychology; (2) two years post-doctorate supervised experience; (3) passing the national PES examination at the 50 percentile; and passing the oral examination in front of four members of the Board of Examiners.
     Also, it is unclear how Carlson could give Dean this information since he asked no questions about educational level on the Questionnaire he asked astrologers to fill out.

    Unknown sample size (less-than-28 astrologers)

     Carlson says “fewer astrologers than hoped for agreed to participate. Only 224 data envelopes were mailed to 28 astrologers. Some of these astrologers simply refused to participate as promised” (p.421, 2nd column). Out of a list of 90 astrologers given to him, he had originally intended for 40 to participate (according to ‘Experiment Detailed Outline’). Only 28 expressed interest in the concept and when the thick data envelopes arrived, the actual month-long task they had to do, “some simply refused”. Carlson reports only the number 28. This is the number of astrologers who had agreed, who had intended to participate, but is not the number of how many actually did.
       In his “FAVORING letter’  to the astrologers October?1981, Carlson  (quoted above) indicates he has already admitted that he has insufficient data from less-than-28 astrologers, so the actual number, then, is much-less-than-28. How much less than 28? We’ll maybe never know. The only numbers given to us is that 116 judgments were made by much-less-than-28 astrologers, who typically agreed to make 4 judgments each. He does not say whether ‘typical’ means mean average or modal average. 116/4 = 29 astrologers. The number doesn’t match.
     There is another misleading subtle play of words here. He says the astrologers agreed to make 4 CPI matchings each (typically), but it was never stated how many the average astrologer did do. All that is known is that out of 4 surviving astrologers found, Sullivan did 3 (personal correspondence, February 11, 2008), two did 0, and one is presumed to have done all four (Weed, personal correspondence May 1, 2008)  This small sample of 16-25%(?) of  participating astrologers did an average of 1.75 matchings.
     This reviewer was alarmed. Professor Ertel (personal correspondence April 4, 2008) looked at this problem and said, giving Carlson the benefit of the doubt, that if N=25 (the highest possible number), then 116 judgments can be achieved with 25 astrologers doing 4 judgments each with 16 astrologers doing one extra. Those numbers are 25 X 4 = 100 + 16 = 116.
       Skeptical that the return rate would be so high (25/28 = 89%) and skeptical that 16/25 astrologers (64%) would agree to do an extra week’s work in addition to the matchings they had not yet finished, this reviewer raised his doubts with Professor Ertel again. The doubts were that this return rate could not be so high and that maybe Carlson didn’t have enough data and had doubled it to get his numbers. Ertel (personal correspondence April 5, 2008) investigated further and said:
     “I don't think C. doubled the numbers because in that case raw frequencies would all be even numbers. But in Figures 3 and 5, I found 13 even and 14 uneven numbers after converting the graph information into numbers. After combining the numbers of the three choice categories (1st, 2nd, 3rd), I find 6 even numbers and 4 uneven numbers, again not suspicious.”

      However, after reflection, despite the raw frequencies, this reviewer still suspects that the actual number of astrologers who actually participated is lower than that. Otherwise, Carlson would not have been so desperate in his “FAVORING’ letter to say
“yet fail at the end due to insufficient data returned”. N=25 astrologers/116 judgments is not ‘insufficient data returned’. In addition, the reviewer ran across a comment from Weed (1986) that when the matching profiles arrived, there was a ‘precipitous drop’ in the number of participating astrologers. It comes down to one of those most uncomfortable moments in science when we believe in the statistics or we believe in our eyes.
     N=15 astrologers, or less, at an average matching rate of 1.75, is ‘insufficient data returned’. N=15 astrologers is based upon 224 data envelopes mailed versus 116 returned. This is a return ratio of 52%. Applying this return rate to the number of astrologers mailed to and we get the numbers of 28 X 52% = 14.5 astrologers. At a ‘typical number’  of matchings at 4 per astrologer, then 14.5 X 4 = 58.  The number 58 is exactly 50% 0f the 116 judgments reported. If 14.5 astrologers did the measured average number of matchings of 1.75, then 14.5 X 1.75 = a total number of judgments of 25, which is insufficient data.
     In conclusion, if there is any misunderstanding about this, it is because of Carlson’s own obvious omission of a very critical value of the experiment, and his refusal, upon request, to divulge the correct, exact number to fellow scientists. He repeated the mythical 28 number himself as late as 1987+. No other number has ever been noted.

4.4  Dependent variables: Judgments

      Three-Choice format

     Why ask the student subjects to discriminate between three astrological interpretations? Why not two? Why ask them and astrologers to discriminate between three CPI profiles? Why not two? Why ask the astrologer subjects to match one horoscope with one of three 18-scale CPI profiles? Why not two?  A two-choice format is what is required, just a binary yes or no. Three choices are not necessary and is more difficult for the subject.
     Ertel gives an excellent review of how using the three-choice format gives the skeptic hypothesis an unfair advantage by diminishing the effect totals. Carlson analyzed the three-choice format in piecemeal fashion and total effects are disregarded. Not only this, but the statistics required to measure results in this way are sophisticated and not well known. Ertel says he should have at least analyzed first and second choices combined, which is what the experimenter said he originally intended.
     Using the two-choice format is the preferred method of choice because it is easier for the subject to do, the statistics are more easily analyzed, and it is what methodologists say is most sensitive to subtle effects. Subtle effects must be allowed for since the neo-astrological Gauquelin effect is only .02-.06.

     Confidence judgments

     A confidence judgment is a subjective guess about how well you will do. The findings from the major researchers in this field, Stankov and Crawford (1997), are that they don’t always have a lot to do with each other, “Correlations between the confidence judgment scores indicate that there is a separate self-confidence trait that is different from ability factors reflecting the speed and accuracy of performance on cognitive test items”.
     Why do it? What is the point? It is a lot of work for the subject who has to make 18 of them on a scale of 1-10. It is a subjective guess for the astrologer who is dealing with complex data. A psychologist giving tests is considerate of the subject and is careful not to cause him unnecessary stress. It takes up limited space in Nature to report. It has nothing to do with the astrological hypothesis and the confidence of the subject has nothing to do with objective fact. All the judgments about the five categories of the CPI were rejected by the experimenter anyway because the subjects ‘were not doing it right’.
     What it seems to do, besides interject a lot of extraneous material making the main theme harder to find, is provide just another task for the astrologers to fail. This is exactly what Carlson proclaimed in the next paragraph in Conclusions when the astrologers’ confidence judgments “was clearly inconsistent with the ‘at least’ .50 level predicted by the astrologers”. He had never said in the experiment that he was going to use confidence judgments to accept or reject the astrological hypothesis.
5. Standards of proof
     This experimenter required an unusually higher standard of proof (significance level) than is normal in social science. It is atypical to find 2.5 standard deviations (actually Z-standardized deviation from chance expectation) as the cut-off point for acceptance or rejection.of the null hypothesis. It is atypical to use standard deviation, a measure of variance, as a criterion for acceptance/rejection at all. Usually the error probabilities of .05 (significant, sd = Z= 1.65) and .01 (very significant, sd = Z = 2.33) are acceptable limits. As Ertel points out, Carlson’s demanded significance of Z = 2.5 is p = .006. Using Carlson’s nonconforming standards would result in missing 80% of results that are significant at .02 (sd = Z = 2.05). Carlson’s original intention to use 4.0 standard deviations would have given such unlikely improbability to finding any but the most prominent effect, that it is just incredible to have even considered it.
      Is there any justification for using 2.5?  In the original manuscript Carlson had submitted to Nature, he had given a reason, “Since a positive astrological effect would be controversial, we decided at the outset to require a 2.5 standard-deviation increase over random choice to interpret the results as favoring the astrological hypothesis. (2.5 is a dividing line in physics experiments often used by skeptics before they are willing to accept a new or startling effect.)”. The parts that are underlined were those parts deleted by Nature. It appears the editor of Carlson’s article did not want to make readers doubt that this significance level is correct for psychological studies rather than only to physics experiments.
     In the ‘Astrology Experiment Detailed Outline’ given to astrologers in 1981, Carlson says, “There is another check. If a strong positive correlation is found … then the results of the second half of the experiment will be compared to the first”. In other words, to ‘win’, the astrologers had to pass both experiments, but to lose they only had to fail one.
     In addition, even if the astrologers had passed all of these tests and conditions, it would not have ‘proven’ that they had ‘won’. Cornelius raised this suspicion in his book (pg. 343) that “had this experiment come up with a plus result for astrology, the critics would have said that its stated interpretative dimension allows ‘intuition’…and intuition or any psi-faculty is somehow not astrology”. Interestingly, Carlson did ask as the first question on the Questionnaire, “What percentage of the average interpretation do you obtain from: a.) the natal chart only; b.) the natal chart and intuition; and c.) intuition only”. He did not indicate the purpose of gathering this information and he did not report the results in his article.
6. Carson’s results  
     Statistical Fluctuation
     When student subjects selected their own astrological interpretation, it was the control group, the ones who did not have their own interpretation, who chose the correct (for them incorrect) interpretation at an ‘almost’ significant level, 2.34 sd (which is significant according to normal psychological standards) as first choice. Since there could be no astrological effect happening, Carlson dismissed this finding as a ‘statistical fluctuation’.
     This, however, is an improbable ‘statistical fluctuation’ at a 1/90 chance of happening. It seemed convenient to sweep this anomaly under the table. There is another, more likely, explanation. In the midst of all this complexity of administration and record keeping and matching and re-matching with different code numbers four times, the data of the control group had been mixed up with the data of the test group.
     Later, an analysis of the Carlson study by astrologer-scholar Geoffrey Cornelius (2003), who had made many of the same complaints this reviewer was re-discovering in 2007, caught the eye when he alluded to details of the “hidden and unpublished standard deviations in Carlson’s study”. In Cornelius book, Appendix 2, he writes:
      But hidden in Carlson’s figures, and conveniently not reproduced for publication, is a 
     devastating figure which can be very simply tallied from his own published table. The 
     control subjects  ranked the ‘right’ interpretation, for their twin, in third place very  
     infrequently indeed, as if they could tell which was a completely unfitting 
     interpretation. They ranked the right interpretation in third place in 18 out of 94 cases 
 (2.9SD). This would be expected by chance less than once in 520 such tests.

     Cornelius’ explanation for this phenomenon is that it is evidence the control group had broken down as a control and that this was an indication of context-psi phenomenon. Vidmar agrees the control group broke down as a control, but chooses the simpler hypothesis that the control and test groups were simply mixed up. Ertel (personal communication, March 20, 2008) has looked at this problem and has said that Cornelius was right that Carlson did not report the 2.9 figure, but also that he was wrong because he followed Carlson’s method of calculation and Carlson’s method was wrong.
     Ertel’s article addresses this problem extensively. When the figures in the experiment are re-calculated with correct statistical procedures, the astrologers ‘won’ at a .05 level. If the control/test groups were mixed up, then the success rate of the astrologers would increase to the .01 level.

     Reporting of data

     Carlson spends at least 1/7 pages of precious space in his Nature article talking about clerical tasks and confidence judgments irrelevant to the astrological hypothesis, but he neglects to mention salient and significant facts. Already mentioned are: the actual number of participating astrologers, the average number of matchings they made, the significant value of 2.9 SD in Part 1, demographic information about subjects, the results of the question about intuition, the confidence judgments for each section of the 5-category astrological interpretation, where he got the CPI, the fact he was analyzing the data before completion of data collection, and the objections made by the astrologers during the course of the experiment. Also, he did not report the false horoscopes given to the astrologers upon which to write their interpretations and make judgments.
    
The experimenter says he gave the astrologers natal charts.  He also gave them   progressed charts. Progressed charts are particularly necessary to answer the question about Category 5, ‘Current Situation’. They are also used to determine how a test subject might answer the CPI according to whatever might be influencing his test-taking mood. They are necessary to do the tasks they were given.
     To control for ‘Regional Bias’, Carlson says, he removed “the location and time of birth from all charts” and in the ‘Astrology Experiment Detailed Outline’ he writes in capital letters, “DO NOT WORK BACKWARDS FROM THE NATAL CHART TO DEDUCT THE NATAL DATA. Please limit yourself only to the information we supply you.” He writes that any subject who does not follow this rule is a ‘devious astrologer’. He writes further, “I realize that these are unusual conditions to require an astrologer to hold to, but we must be strict to ensure the validity of the experimental method…I hope you will agree to abide by them.”
     At least one of the astrologers did not abide by them when she discovered one of the progressed horoscopes did not match the natal chart. It turned out to be a person born in 1900. She would not have discovered the error if she had followed the rules the experimenter put down. Upset, Sullivan wrote a letter to Carlson May 14, 1981. He wrote back to her 12 days later congratulating her on her alertness and saying she was “only the second person who reported this error.” In his ‘Rebuttal’ to astrologers in January 1987, he contradicts himself when he says “one flawed chart did get sent out and was noticed immediately by Sullivan”.  There were at least two he had been informed of and he reports only one and then only after the issue had been raised after publication.
     Not qualified in horoscopic interpretation, he said “such an error would be obvious at once to any astrologer”. He neglects to mention he had already severely restricted them to the data given and to not work backwards to get correct data. He also neglects to mention that the ‘advising expert astrologer’, who supposedly made the chart, did not find it obvious and, in addition, it is unlikely for an expert astrologer to make a mistake like this. He said he spot-checked the charts in his office for further errors, but he did not indicate he alerted the other astrologers to check for errors in the charts they had already been given.
     Not working with the right data is shocking to an astrologer. The experimenter didn’t seem to think much about it and wrote back to Sullivan she should re-calculate it herself “should you need them”. They are significantly needed for reasons already given and he didn’t seem to notice that she needed the birth data to do the re-calculation. This is the time at which she dropped out of the experiment after having made three matchings.      

7. Carlson’s conclusions

     The experimenter made an estimated 50+ contestable claims and presuppositions in just 7 pages. The reviewer is at a disadvantage in refuting a carelessly made and unsubstantiated claim because it takes so much more space and longer time to refute a claim than to make it in the first place.  By the time just one claim has been refuted, ten more have been made. The dilemma is to let contestable claims go unmentioned and unchallenged or to list those claims without proper documentation. Letting many claims go unchallenged, only the following five made in his Conclusion are mentioned here: 
      Saying astrology had been given every reasonable chance to succeed, Carlson claimed  “It failed. Despite the fact that we worked with some of the best astrologers in the country, recommended … for their expertise in astrology and in their ability to use the CPI….. despite the fact that the astrologers approved the design and predicted 50 percent as the 'minimum' effect they would expect to see, astrology failed to perform at a level better than chance.” (Pg. 425)

  1.  ‘Expertise in astrology’. In his ‘Questionnaire’ given to astrologers in 1981, Carlson asked how much of their interpretation was due to intuition, but he did not ask anything about their publications, presentations, length of study, length of practice, number of sessions given, etc.
  2.  ‘Best in the country’. According to who on what basis? He admits in his original 1983 manuscript (deleted by Nature) that different astrologers (and scientists) have different opinions of each other. There is no mention of where they are from. ‘Nominated’ by members of a local chapter in S.F., the reader would assume they were from the Bay Area.
  3. Astrologers ‘approved the design’. It is supposed to be a scientific design and they are not trained as scientists. Their approval has nothing to do with the validity of the experiment. The experimenter is responsible for the design and cannot shuffle this responsibility off to someone else.
  4. Astrologers ‘predicted’ 50% as the minimum effect. A subjective guess of effect size is not any kind of ‘prediction’. ‘Prediction’ is the wrong word.
  5. Astrology failed to perform at a level better than chance. No, the astrologers performed by matching a psychological test. Astrology was not measured. Also, comparison to chance is not appropriate here. The task is not random. The hit rate should be compared to the inter-rater reliability of qualified psychologists doing the same task. It is unknown what failed, the astrologers or the experiment.

8. Background conditions of Carlson’s project

     Bias in Carlson’s study
     Bias is evident from the start in the materials he gave to the astrologer subjects in 1981. In the ‘Astrology Experiment Detailed Outline’, he refers to anyone who does not follow his rather tight restrictions, which he says himself are ‘unusual’, as a ‘devious astrologer’. It was not missed by one astrologer subject, who wrote to Carlson upon receiving a false horoscope, commenting “one wonders what is devious”.
     Bias is evident when he says in the same ‘Detailed Outline’ that he intended to publish in the ‘Skeptical Inquirer”, which the experimenter says is a “nationally respected scientific journal”.  It is actually an in-house organ of CSICOP and editorial policy follows CSICOP debunking philosophy. It will hardly ever publish results positive to astrology, except to throw them up to ridicule, and is further evidence that the experimenter expected to get null results which would ‘prove’ astrology is false.
     In post-publication history, Carlson’s (1986) first response to astrologers’ complaints was an accusation of ‘mudslinging and irrational emotionalism’. Actually, the rudest thing said, which Carlson called ‘libelous’, was “Carlson’s objectivity is very doubtful” and “the media can always be counted on to respond with knee-jerk predictability to sophomoric opinionation by self-styled scientists”. These statements by Jim Lewis (1986) are mild compared to the opinions made of astrologers by Carlson (1988) in the newsletter of the Bay Area Skeptics:
“people crazy… certifiable kooks… New Agers, crystal gazers and time-warped, tie-dyed, mantra-chanting hippie people seeking God on the slopes of Mt. Shasta… Nancy Reagan regularly consults a California occultist about the President's agenda…. Don't laugh too hard. Millions of Americans take astrology very seriously….. Yes, astrology, that ancient inanity that was part of the "dark" of the Dark Ages, is alive and well in the 1980's. … Astrology is big business. 
dopey prognosticators… Astrology is nonsense… astrology is a danger… believers...
I am annoyed… malarkey… seduce a credulous public… awe the uninformed with such torrents of astro-babel thinking that there may be something to this nonsense… no one could be scientifically competent….. Nob Hill soothsayer…
ancient superstition… meandering excursion into rubbish… abandon reason and indulge in astro-fantasy…..trysts with idiocy… occult irrationality.

     Support by CSICOP  
     CSICOP is well-known for its attacks on astrology, beginning with Chairman-for-life Kurtz’ (1975) ‘Objections to Astrology’, signed by 186 scientists and sent to all 3000 newspapers in America, calling astrologers ‘charlatans’. What CSICOP says it is and what it does are two different things. Although the words ‘scientific investigation’ are part of its title, it adopted the policy in 1981 of not doing any. Its methods have been questioned by Hans Eysenck (1982), Suitbert Ertel (1996), and co-founder Dennis Rawlins (1981). Its original co-chairman and first editor, Marcello Truzzi (1989), called CSICOP skepticism actually ‘debunking’ and wrote, “The major interest of the Committee was not inquiry but to serve as an advocacy body, a public relations group for scientific orthodoxy”. 
     According to public statements of Paul Kurtz (2005), in his 30-year anniversary article, CSICOP ‘encouraged’ Carlson to do this project.  Carlson’s first intention, as indicated in the 1981 ‘Experiment Detailed Outline’, was to publish in the CSICOP journal, Skeptical Inquirer. Carlson’s project was funded and the methods were guided by his advisor, physicist Richard Muller, who is a CSICOP Fellow. The major advisor to the experiment with expertise in astrological research became a CSICOP Fellow in 2003. The editor of the journal, John Maddox, where the article was published without peer-review, was a CSICOP Fellow. Carlson (1988) himself was a Board Member of the Bay Area Skeptics at least as early as 1988 and is currently on the CSMMH debunking alternative medicine.
     He collaborated after publication with Randi the Magician, a CSICOP Fellow. In 1999, he received a $290,000 non-restricted, non-accountable, tax-free award from the MacArthur Foundation as did his mentors, Richard Muller (1982) and Randi the Magician (1986). The Director of the MacArthur Foundation from 1979-2002, Murray Gell-Mann, was a CSICOP Fellow and at least half a dozen more members of CSICOP received this award, the “entire process of which is anonymous and confidential” according to Wikipedia (2007).
     There is a solid CSICOP connection all the way from encouragement to funding to two of his advisors to publication by a scientific journal to the ‘media circus’ that followed. The science writer of the first newspaper article that came out in San Francisco the same day it appeared in ‘Nature’, was a CSICOP writer. According to Hansen’s (1992) scholarly sociological study, CSICOP’s “highest priority has been to influence the media”. Carlson was half right when he told the astrologers in his ‘Rebuttal’ that he did nothing to create the media circus that followed publication. It was created for him. This is what it appears it was all about. The years 1981-1982 were years of crisis for CSICOP  and many of its senior members resigned in protest over the Gauquelin affair. It appears that this experiment was conducted in 1981 to counter balance the positive Gauquelin results in the media. In other words, the purpose of the experiment was not scientific inquiry, but to serve as a publicity stunt.
    Support by NATURE
Nature is the most prestigious and powerful journal in science. Nature (2008) calls itself “the voice of science”. It has more citations and more impact factor than any other journal. In the third quarter of 2007, it had more than 38 million readers go to their website. There is such a demand to be published in Nature, that they have space to publish only one paper out of ten that are submitted. As Nature itself says, “As Nature grew more influential, more pages were added, and the rejection rate rose ever higher… the problem rather was that so much good wheat had to be rejected purely for want of space”. Gratzer (2008) writes in a historical review on Nature’s website:
     “A paper in Nature could change the course of a career, and to say that there were those who would kill to achieve that consummation is not altogether an overstatement, for on at least one occasion Dai Davies received a death threat from an enraged supplicant. During John Maddox's first term, a repeatedly rejected physicist threatened to immolate himself on the steps of the Nature office. (He forbore in the end to do so, having hit on the ruse of publishing his paper as an advertisement.).”
     How did Carlson’s invalid experiment(s) get published in a journal like this? It was so poorly organized, so obviously invalid to any scientist with a grasp of statistics or design, that it is improbable it would be accepted, since 90% of other good papers by senior faculty and experienced PhDs were rejected.
     The article was printed in the ‘Commentary’ section of the journal, which currently has a limit of 1 or 2 pages.  Carlson’s article was 7-pages long. Even normal articles have a strict limit of 5 pages. The ‘Commentary’ section is the ‘opinion’ section of the journal and is the only section which bypasses peer-review. If the commentary has technical information, which the Carlson article does, it is supposed to undergo peer-review. Typically, papers in the Commentary section are placed there at the editor’s unilateral prerogative and are not subject to an editorial committee. The editor in this case was John Maddox, a CSICOP Fellow. Harding (N.D.), an astrologer-academic therapist, writes,it is unlikely that any pro-astrology research, however impeccably presented would ever have been published in Nature, whose editor, John Maddox, had this to say on the subject”:
      “It is a plain fact that astrology is a pack of lies in the literal sense; those who peddle horoscopes do so on an explicit set of statements about the real world that cannot be correct… That they are told, and believed by countless innocents in flat contradiction of the more objective view of the world accumulated over several centuries, means that each and every horoscope is, by denying the objective view of the planets, an attack on the probity of science... Would other professionals, lawyers or accountants say, be as tolerant of public belief that undermined the integrity of their work -and, potentially, their livelihood?”(Maddox, 1994)
       Urban-Lurain (N.D.) says, “The obvious misuse of the CPI should not have passed the referee stage of publication, particularly in a journal with the stature of Nature”. This paper never went through the referee stage.  Maddox frequently bypassed peer review. A historical review by Gratzer on Nature’s website says, Maddox’s innumerable contacts…enabled him to solicit papers from leading laboratories. These were sometimes published without the tedious formality of review”. 
     Not only was Carlson’s paper published without peer review, which was not mentioned to Nature’s readers, but other irregularities occur. The published article by Nature made remarkably few changes from what had been submitted. One of those few deletions was the reason why Carlson chose a 2.5 sd cutoff point. Nature’s own editorial policy stipulates that the author give the rationale for statistical procedures which are used and yet the editor of Carlson’s article deleted that rationale.
     The other rare deletion by Nature was Carlson’s original statement that “a positive astrological effect would be controversial”. By deleting this statement, the editor then eliminated the accusation that a positive effect would have been hotly contested, but a null result would occur without criticism or complaint. Urban-Lurain (N.D.) writes:

We were unable to find any references to this article in subsequent issues of Nature. A search of the Science Citation Index revealed two references to Carlson's article.  Neither of the references discussed the design aspects of the study, but rather
      cited it as evidence of the failure of astrology.

We do not know what, if any, responses to Carlson's study were submitted to Nature; none were published.  We do find it interesting that objections to the design and conclusions which are self-evident from reading the article, found outlets only in obscure astrological journals. (Urban-Lurain, N.D.)

     Another irregularity is in the delay between submission and publication, April 1983 – October 1985. Even the skeptics comment about this:  “One strange aspect about the study, though, still needs discussion: why was it published only several years after its submission”. Nature’s webpage for authors says, “authors are usually informed within a week if the paper is not being considered”. Urban-Lurain (N.D.) called the editorial office of Nature in Washington, D.C. and talked with the news editor:
“Pool said that the length of delay was unusual, and probably caused by problems with the original article or difficulty obtaining referee acceptance.  He did indicate that articles published as Commentary are not subjected to the same peer review process as the regular articles (and even some letters!); rather they are published at the discretion of the editor-in-chief.”
9.  Background of the experimenter
      At the time of the experiments in 1980-81, the experimenter was an undergraduate in physics. At the time of submission to Nature in 1983, he was finishing his M.A. in physics. He had neither expertise in psychology nor astrology, the two subjects of this study. He had training in the scientific method in the physical sciences, but not with experimental design used in the social/behavioral sciences nor the use of human subjects. He had neither the training nor the qualifications to use the CPI.    
     This was the first and last time Carlson did any kind of experiment in astrology. He showed no other interest in the subject. Afterwards, he spent three to five years (reports vary) in the study of satanic rituals and devil worship and collaborated with Randi the Magician. Throughout the years, he has been active in skeptic organizations and his ideas about what is scientific and what is not are strong. In 1988, he wrote, “no one could be scientifically competent, knowledgeable of the evidence and still think that astrologers can perform the service”.
     Carlson (personal correspondence April 9, 2008) writes that “At the time of my study I was a young man with a deep interest in the occult”.  He also reports in newspaper interviews that he worked his way through college as a professional magician. It is not until 2005 that he divulges the information in an interview with Johnson of the Boston Globe (2005), that at the age of 16, “he supported himself as a street psychic and player of Three-card Monte”.

     Three-card Monte – game behavior in science?
     Three-card Monte is a con game (Wikipedia, 2008).  It is not considered a gambling game because the mark cannot win. Most magicians know this trick. It is illegal and a misdemeanor in most cities if it is played for money. Since Carlson “supported himself”, he played for money.
      Even professional con artists do not like this scam. One who publicly describes this trick says, “I don't have any problem with the ethics of the 3-card Monte game, in that the victim is always just as morally culpable as the operator of the game… I think that both the victim and the operator are equally in the wrong… I have no sympathy for the victim, who really deserves what he gets”. Hayden (N.D.) said he would allow 3-card Monte simply as a matter of free speech except “there is such a high incidence of violence connected with it”. He says:
          “Three-card Monte is a mean-spirited game. If the game were not illegal, it still is not a fun way to make a living for any length of time, and I don't think it is a healthy thing for the operator spiritually.…Operating a three-card Monte game teaches you to look for the weakness and vice in other people…to be cynical and mean and to feel superior rather than to see the universality of mankind.”
     There are some similarities between the steps of Three-card Monte and the experimental design of this study. The three advising astrologers were shown the design and agreed that the test was fair. The astrologers were invited to participate in a ‘scientific study’ at prestigious U.C. Berkeley to help prove that astrology was right. There were three CPI profiles and three astrological interpretations and three cards among which to choose. When the data envelopes arrived, they found the concept they had agreed to had been switched to an impossible task. The reporting of this experiment in the Nature article was confusing because of mixing both results from experiment 1 and experiment 2 in the same paragraph. Results were reported on what he intended to do (“top card”), but not on what actually happened (“bottom card”). Partial results were reported on one page and another supplemental part of results were reported two pages later with a lot of extraneous information about confidence judgments or clerical procedures in-between (the characteristic “sideways sweep”). Misdirection is the primary trick of Three-card Monte. It was used when the experimenter would say one thing and do another. Misdirection was also involved when attention was put on hypothetical numbers, but crucial values were not reported. The astrologers felt they had been tricked, particularly after being sent a letter that preliminary results were favoring astrology and astrology would be proven.
Conclusion of the review
     That Carlson’s Conclusion(s) does not follow from the Results was stated by
Eysenck in 1986. This experiment was unfair and invalid. All the mistakes were skewed in favor of the ‘skeptic hypothesis’ toward which the experimenter had a strong belief. CSICOP influence was pervasive from encouragement to funding to publication. The experimenter had previous professional experience in deception and con games. It appears that he continued a con game of Three-card Monte type disguised with appearances of scientific methodology to cloak the real intention of media disinformation.
      Student subjects could not choose correct astrological interpretation, they could not choose correct CPI profile, and astrologer subjects could not choose correct CPI. Two of four surviving astrologers said ‘it was impossible” and both submitted zero response. The data about rating sections of each interpretation was rejected. The hypotheses, the design, the apparatus were flawed with over 60+ mistakes and errors on the way to Results. The author claims that the astrologers failed. This reviewer thinks the experiment failed.
     The paradox is that if the numbers given to the reader are correct, then using statistical analysis with the correct methods, the astrologers chose the correct CPI at significance of at least .05. If the control/test group results were mixed up, the combined value is p=.01.
     Before persons with a pro-astrology orientation begin celebrating, however, that this experiment is any kind of proof for astrology, it depends upon the validity of the experimental design and the veracity of the numbers. The evidence is that the experiment is not valid and that the numbers of less-than-28 astrologers and ‘typically 4’ matchings are highly suspect and unreliable.
   The dilemma is this: (1) This is a case of scientific misconduct, the experiment is invalid, and the astrologers did not win or lose; or (2) Despite the experimenter’s misconduct, the experiment is valid, and the astrologers won at an acceptable significance level. The choice is between (1) misconduct, or (2) the astrologers won, or (3) both.
     The Carlson study is not the first case of scientific misconduct in history, but it is unique in that, contrary to prevailing scientific belief, it involved more than just a single individual. It involved an organization with a powerful influence that purports to speak for the body politic of science. It involved, for the first time known, the science journal in which it was published. Nature is just as responsible as the author. The publicity impact of this article was based upon the prestige and power of the journal in which it was printed, not upon the merits of the experiment.

Disclaimer: This reviewer, before retirement, was a professor of psychology at McNeese State University, a licensed psychologist, and past-president of the Louisiana Psychological Association. He has ten years experience in supervising 80+ M.A. and Ph.D. unlicensed psychologists according to standards of practice of both APA and NASP. He was managing editor of a refereed scientific journal in psychology for five years. After that, he has also given 3000 astrology counseling sessions as well to an international community of some 40 nationalities. He considers himself first a counseling psychologist who is also using astrological methods, as well as others (e.g., NLP). He has some expertise in both systems of psychology and astrology.  

REFERENCES

 

Carlson, S. (1985). A double blind test of astrology, Nature, Vol. 318, December 5, 1985.

Carlson, S. (1986). Carlson Responds to AFAN, AFAN Newsletter, February (?) 1986

Carlson, S. (1988). Lunacy on Pennsylvania Avenue, Basis: Newsletter of the Bay Area 
      Skeptics, July 1988, Vol.7 (7).

Consulting Psycholgists Press (2008), Get Qualified-Certified. Retrieved January 4,  
      2008 from Web site: http://www.cpp.com/qual/index.asp

Cornelius, G. (1986). The NCGR-Berkeley Double-Blind Test of Astrology, Astrology 
     Quarterly, Vol. 59 (4), Winter 1985/6.

Cornelius, G. (2003). The Moment of Astrology: Origins in Divination, Bournemouth,   
       England: Wessex Astrologer Ltd.
Dean, G. and Nias, D., (1997). Professor H.J.Eysenck: In Memoriam 1916-1997,    
       Correlation 16(1), 48-54.
Dean, G., (N.D.). Summary of Carlson’s Double-Blind Test of Astrology
Retrieved November 23, 2007 from Website: www.astrology-and-science.com

Ertel, S. (2008), Critical Appraisal of Shawn Carlson’s renown astrology tests, (subm)

Ertel, S., and Irving, K. (1996). The Tenacious Mars Effect. London: The Urantia
        Trust.

Ethical Principles of Psychologists and Code Of Conduct (2002). Washington, DC:       
      American Psychological Association (2002), Retrieved December 1, 2007 from Web  
      site: http://www.apa.org/ethics.

Eysenck, H.J., and Nias, D. (1982). Astrology: Science or Superstition. NY: St 
     Martins Press.

Eysenck, H.J. (1986). Critique of “A Double-Blind Test of Astrology”, Astro-  
      Psychological Problems, Vol.4 (1), January 1986.

Eysenck, H.J. (1986). A double-blind test of astrology. Letters to the editor. Correlation, 
June 1986, Vol.6 (1), pp.15-16.
     
Gough, H. (N.D.). An Interpreter’s Syllabus: California Psychological Inventory. 
      Consulting Psychologists Press, Inc. Palo Alto, California (reprint from McReynolds,  
        Paul (Ed.), Advances in Psychological Assessment, Volume 1. Copyright – Science 
        and Behavior Books. Palo Alto, California, 1968).

Gratzer, W. (N.D.), Nature – The Maddox Years, Retrieved February 1, 2008 from 
       Web site: http://www.nature.com/nature/history/full/nature06241.html

Hansen, G.P. (1992). CSICOP and the Skeptics: An Overview, The Journal of the
       American Society for Psychical Research, Vol. 86 (1), January 1992, pp. 19-63.

Harding, M. (N.D.) Prejudice in Astrological Research, Correlation, Vol. 19 (1).  
       Retrieved November 1, 2007 at Website: 
       http://www.astrozero.co.uk/astroscience/harding.htm

Hayden, W., (N.D.). Three-Card Monte from a Scoundrel’s Perspective, Retrieved 
       March 11, 2008 from Website: www.schoolforscoundrels.com

 Johnson, C. (2005). His scouts will learn science, not tie knots, Boston Globe,  
      September 12, 2005.

Josephson, B. (2004). Scientists’ Unethical Use of Media for Propaganda Purposes.
       December 13, 2004. Found at http://www.tcm.phy.cam.ac.uk/~bdj10/propaganda/

Kurtz, P, Jerome, L., Bok, B., (1975). Objections to Astrology: A Statement by 186 
      Leading  Scientists, The Humanist, September/October 1975.

Kurtz, P. (2006). Science and the Public: Summing Up Thirty Years of the Skeptical  
       Inquirer, Skeptical Inquirer, September 2006.

Lewis, J., (1986). Some Objections to Shawn Carlson’s Experiment, AFAN 
      Newsletter, January 1986.
Lower, S.K., Treating astrology’s claims with all due gravity, Correspondence, Nature      
     447, 528 (31 May 2007) | doi:10.1038/447528a; Published online 30 May 2007

MacArthur Foundation, In Wikipedia, August 15, 2007

Maddox, J., (1994). Defending Science Against Anti-Science, Nature, 368:185, March 17,  
      1994.

Nature, Retrieved February 1, 2008 from Website: 
      http://www.nature.com/naturejobs/advertisers/index.html;  
      http://www.nature.com/nature/about/

Rawlins, D. (1981), sTARBABY, FATE, No.34, October 1981. Retrieved January 11, 
       2008 from Web site: http://cura.free.fr/xv/14starbb.html

Skeptical Studies in Astrology, (N.D.).  Retrieved December 11, 2007 from Web site:
      http://www.psychicinvestigator.com/demo/AstroSkc.htm
Stankov, L. & Crawford, J. (1997). Self-confidence and performance on cognitive
       tests. Intelligence, 25, 93-109.
Three-Card Monte, Wikipedia, March 7, 2008.

Truzzi, M. (1989). Reflections on the Reception of Unconventional Claims in  
       Science. November 29, 1989 Colloquium. [The following article originally appeared  
       in "Frontier Perspectives" (vol. 1 number 2, Fall/Winter 1990), the newsletter of The 
      Center for Frontier Sciences at Temple University, found at Web site:   
       http://www.fiu.edu/~mizrachs/truzzi.html

Urban-Lurain, M. (1984). Astrology As Science: A Statistical Approach. Tempe,Az:  
      American Federation of Astrologers, 1984.

Urban-Lurain, M., Rowe, W., Pierce, D., Banghart, R. (N.D.) (collection of posts 
      sent to SCI.SKEPTIC website).

Weed, T.H. (1986). Critique of the Carlson Study: Double-Blind Test of Astrology, 
      KOSMOS, Spring 1986, pp. 26-31 (Publisher ISAR, 
       International Society for Astrological Research)

Weed, T.H. (1986). Critique of the Carlson Study, Astro-Psychological Problems
     Vol. 4 (1), January 1986.