John J. Kosinski (jkosins1@swarthmore.edu)
Caroline Sehnaoui (csehnao1@swarthmore.edu)
March 29, 1998
Bio 9: The Social Impact of Science

The Quantitative Misuse/Use of Statistics
"You can prove anything using statistics."
Senator Kit Bond (R-MO) in a speech on the Senate Floor, January, 1994

Statistics play a large role in research, yet it is important to consider many factors which allow researchers to make the conclusions that they do. Numbers and graphs are often impressive to the average laymen who would never critically analyze them, because they do not have the proper tools to do so. We shall provide you with the tools necessary to critically analyze statistics and demonstrate how statistics may be misleading.

A National Example of Misuse:

In the 1936 United States Presidential election, a widely read and highly respected magazine conducted a survey poll as to who would win between Governor Alf Landon of Kansas and Franklin D. Roosevelt, the incumbent. To conduct the survey, the pollsters sent out sample ballots to a large number of people who were listed in telephone directories and car registries. Many people in 1936 did not have telephones and/or cars which clearly, automatically eliminated them from the survey. The survey thus discriminated against working class people, since during a Great Depression only an elite could afford such luxuries. Among those that did reply, Alf Landon was the overwhelming favorite. If readers of this magazine would have known something about statistics, they would have been skeptical about the claim that Alf Landon was the easy favorite to win. As we know today, FDR was an easy winner in the 1936 elections, and this represents just one example of how bias in surveys may draw false conclusions.

The Difficulties of a Census

What prohibits researchers from conducting a census when they want information about an entire population? In an ideal world, researchers would be able to contact all the members of a given population and obtain accurate results from all the members of a population. But, there are a number of factors which force researchers to conduct small surveys, as opposed to a broader, larger census. Even the federal government conducts a census only once every ten years. The following is a list or problems with conducting a census:

--populations are often too large, thus making the process too expensive and time consuming
--a relatively small sample may reveal more accurate information about a population (this is true in developing nations where there is a lack of properly educated people that are able to conduct such an experiment
--no census is foolproof since even governments have missed small numbers of the population, which could result in large numbers of a single group being 'missed' (i.e., the 1980 U.S. census missed an estimated 1.4% of the American population, which included an estimated 5.9% of the Black population)

Types of Sampling

Sampling procedure, much like the anecdote above, could easily be manipulated to obtain certain results or influence certain sectors of the population. This is one area where people need to be most critical in understanding statistical findings, and most careful in conducting personal research.

--convenience sampling: in this method researchers choose participants that are easily accessible; this often leads to misleading conclusions about the population and are often biased; researchers use this tactic mischievously to support their hypothesis

--simple random sampling: the method is more convenient since it eliminates human choice by allowing random, impersonal chance decide a survey's participants; all units are given equal chance to be included in the sample and include no bias in selecting a sample

A good sample is considered to have low bias and high precision.

Problems of Sampling

There are two types of errors in sampling which could have misleading effects on research.

--sampling errors: mistakes caused by the act of taking a sample, thus leading sample results to be different from census results; the more misleading the act of conducting a sample may be, the more the results are skewed

--nonsampling errors: mistakes are not necessarily rooted in the survey process, but are rooted in human behavior and would appear even in a census

Four Types of Nonsampling Errors

--missing data- subjects refuse to respond to the surveys or the inability to contact a subject; if these missing peoples response differ from the rest of the population, bias will result
--response errors- subjects may lie about their age or income (i.e., respondents may lie about the amount of alcohol they consume on a weekly basis or the amount of money they spend on illegal substances each week); respondents that do not understand a question may invent an answer rather than admit that they do not understand--
processing errors- errors may be made by those individuals entering the information into a database or by committing common mathematical mistakes--
effect of data collection- the method used to collect the data could influence the results (i.e., if a fat person is interviewing random people about whether more money should be spent to cure obesity)
--timing- surveys conducted concerning economic policy during a depression might have different results than surveys conducted during periods of economic stability
--exact wording of questions- the way a question is phrased could greatly influence the way an individual responds, especially if it is a 'loaded question" (i.e., What you support federal funds for abortion so that millions of unwanted children are not born every year?)
--means of conducting a survey- certain means of conducting a survey could have an influence on the people that respond (i.e., mail surveys, being the least expensive method are practical, but they often receive responses from individuals who vehemently oppose or support an issue; telephone surveys are fast and economical yet do not include those families or individuals that do not have telephones; personal interviews, the least biased of the methods, are expensive and offer contact, yet every respondent does not get the question the same way)

Tables and Graphs: Visual Misrepresentation of Data

Statisticians use a much simpler means to communicate results, besides pages of numbers and results: tables, charts and graphs. When dealing with any type of table, chart or graph, it is very important to understand the source of the information and the definition of the terms being used. (i.e.; if we were reading a table of the largest countries in the world and their economic output, we must know if they define size by geography or population). Individuals need to beware of the mechanisms by which visual representations of data can be adjusted and manipulated in such a way as to draw a false conclusion. The following are a list of a variety of graphing and tabling procedures and ways that they can be manipulated:

frequency tables-- These tables count the number of a certain response in an interview divided by the total number of participants in an interview. For instance if 1500 people were interviewed concerning whether they approved of the President, and 900 said that they approved of the President, 400 stated that they did not approve of the President and 100 had no comment, then we could say that the President's approval rating was 60%. However, some statisticians might ignore the 100 that failed to respond or to comment on the President, thus lifting his approval rating to over 64%. Another large problem with such tables are that statisticians or those conducting the research often round numbers off, and for the sake of simplicity of frequency tables, they often round to the nearest hundred or even thousand, thus creating problems of interpretation.

pie charts-- These charts use a visual 'pie' to represent some statistical concept. A circle is divided by degrees into the number multiplied by 360 to give the exact proportion one part of the pie should be (i.e.; if 28% of the American population uses crack, we would change 28% into a decimal and multiply it by 360 (degrees in a circle) to deduce how many degrees of the 'pie' that this information should occupy out of the whole 'pie', (.28)(360)=100.8; thus the percentages of crack users should occupy just over 100 degrees). However, humans judge length much better than degrees of a circle, and therefore, circle charts can often be misleading or allow false conclusions to be made. Furthermore, pie charts often are convoluted and cluttered and therefore makes it hard to gain a clear visual representation of the data.

dot charts-- These charts present the information using dotted lines with dark endpoints. They are often clear, concise and to the point. They provide the simplest means by which to present information and allow for a clear visual representation. The only manipulation which could occur is if the chart maker is able to misrepresent some information with wrong intervals, thus making some lines appear longer or shorter than they should be.

line graphs-- These graphs use lines to connect the data points between pieces of information. These are perhaps the most utilized form of graphing, yet can easily be manipulated. Be sure to check the scales of both the horizontal and vertical axis to insure that intervals are properly/evenly divided. Beware of breaks in the information provided. Finally, check to make sure that the data is not being presented in a way which might visually represent the opposite of what the information reveals. (i.e., make sure that the information on the x-axis does not belong on the y-axis, or vice versa). (see Moore, p. 183, for an in-depth explanation of this phenomenon)

bar graphs-- These graphs use bars to visually depict two or more variables in comparison. Usually these graphs compare two variables throughout a number of different variables. (In Moore's book, he uses a bar graph of the number of men and women receiving bachelor, master, and doctoral degrees). There are two ways these results can be misrepresentative. First, the reader needs to insure that the bars are the same width, since our eyes respond to area more than they do height. Furthermore, the bars of bar graphs are often shaped into different objects, which can easily overexaggerate any differences between the results.

Conclusion

As noted, there are a variety of ways that researchers can inadvertently (or purposefully) manipulate surveys to ensure that public opinion or their research findings support the idea that they want. It is extremely important that students and the general public understand these mechanisms to insure that the polls distributed to the public are critically analyzed. Not doing so could lead to misinformation and wrong conclusions.

Bibliography

Moore, David S., Statistics: Concepts and Controversies. (third edition). W.H. Freeman and Company, New York. 1991.

On-Campus Sources:

For more information about statistics and the misrepresentation of quantitative information contact Professor Gudmund Iversen or Professor Philip Everson in the mathematics and statistics department, or check out the books and syllabi used for the following courses:
Stat. 1: Statistical Thinking
Stat. 2/Soc. Anth. 22: Statistical Methods
Stat. 2C/Soc. Anth. 28: Statistics