# Descriptive Statistics Essay

## Descriptive Statistics

In research, the need to accurately produce results is imperative to efficient research. Team C’s hypothesis of what generates champion teams needs many tools provided in research to achieve a true answer. Team C has further simplified the meaning of champion team to any team whose team dynamics cause the team to have a winning season. With that being said, this paper will be focused on the research tools needed and the results provided by the tools to answer what stats are important for teams in the MLB to win games and eventually be champions. Measures of Central Tendencies

Even when dealing with enormous sets of data it is important to get an idea by looking at the measures of central tendency. The first three that will be looked at are mean, median, and mode. Mean is “a measure of central tendency that offers a general picture of data without inundating one with each of the observations in a data set” (Sekaran, p. 396, para 3). A more common term for mean is average. The median is “the central item in a group of observations when they are arrayed in ascending or descending order” (Sekaran, p. 396, para 5). Mode is the “most frequently occurring phenomenon” (Sekaran, 396, para 6). The following table shows the mean, median, and mode for the four sets of data that Team C will be researching: Wins, Salary, Total Season Attendance, and Team Earned Run Average.

Although the chart has shown detailed information, the need for dispersion will aid in achieving more precise data collection. Dispersion

Dispersion is a critical part of statistics because of the accuracy factor. In team C’s hypothesis, the stats the team are searching for are the stats that generate wins for a Major League Baseball team. In dispersion, four subsets can help develop a more accurate picture of Team C’s hypothesis. The four are range, average deviation, variance, and standard deviation.

The four tools of dispersion help to paint a clear picture of how the four identified stats help develop winning teams. Measure skewness will help to make sure the data collected is uniform. Measure of Skew

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point (NIST/SEMATECK, 2010). The skewness for a normal distribution is zero, and any symmetric data should have skewness near zero. Negative values for the skewness indicate data skewed left and positive values for the skewness indicate data skewed right. Skewed left is defined as the left tail is long in comparison to the right tail on the histogram. Skewed right is defined as the right tail is long in comparison to the left tail histogram. Wins

Team C’s research demonstrates a champion Major League Baseball team success is a result of the number of wins, salary of players, season attendance, and the team’s earned run average. The average wins for 30 teams, or the mean is 81, whereas the median is equal to the mean at 81 wins. The mode or most recurring number of wins is 95. The skewness of the applied wins data results in a negative number, resulting in a negative or data skewed to the left. In this case the variance is so minimal that the histogram for wins would look symmetrical rather than negatively skewed. Salary

The salary of a Major League player can be closely tied to the quality and quantity of the player’s ability and results. The team salary mean is $73,063,563 and the median is $66,191,417. The skewness for this data is a 2.17, positively skewed to the right which means that the mean exceeds the median. This dramatic difference in data is a result of the variation in the highest team salary compared to the mean. The mean is $73,063,563 and there are extremes in excess of 200 million dollars for a team salary pulling the mean in excess of the median. Attendance

Attendance in a Major League game directly impacts the budget and ability to pay higher salaries for better players. The data researched shows a mean of 2.4 million and a median of 2.5 million. The skewness is displayed as positively skewed or skewed to the right. The variance is very minimal resulting in a symmetrical histogram. The slight pull to the right is a result of increased attendance at 3.5 to 4 million at a handful of stadiums. Team ERA

Measurement of Central Tendency and Dispersion of Data

Mean, median and mode are used to measure central tendency and the dispersion of data. “In general, the mean is the descriptive statistic most often used to describe the central tendency of a group of measurements.”(Science Buddies, 2010) However, the mean is not always the best measure of central tendency and dispersion when there is a presence of extreme values in the data. “Of the three measures, it is the most sensitive measurement, because its value always reflects the contributions of each of the data values in the group. The median and the mode are less sensitive to “outliers”—data values at the extremes of a group.”(Science Buddies, 2010) The mode measures the highest recorded frequencies of data measures, and it helps to determine where most of the data lies. The mode is very useful when the data is overly skewed. The median helps to determine the quartile range and the skew of the data. The median is not affected much by the small proportion of the data with very high or very low values. The median is a good measure of the central tendency and dispersion of the data when considering what makes a Major League Baseball team successful team. After reviewing all data collected, Team C has derived that the combination of these stats gives the solution for the hypothesis posed. Solution

After extensive research, Team C has discovered that the factors the team focused on do have an effect on the wins for a Major League Baseball team. In the case of attendance, a successful team needs a minimum of 2.4 million fans to be able to pay quality players. In addition, this high fan base can help generate the 73 million needed to pay quality players and operate the team. These quality players need to provide a minimum of 4.28 for the ERA. Although this stat is based on a pitcher, the team as a whole has to be good enough to aid the pitcher in this goal. If the teams can achieve this goal, their average wins would be well over 81 wins for the season. This is a winning season, and eventually, as numerous teams that have fallen into these categories have shown, the championship could be the reward. Conclusion

A team that plays smart and efficient will win games and championships. The number of wins, salaries, attendance, and earned run average (ERA) contribute to this success. ERA is the average number of runs allowed by the pitcher. The lower number of runs the better. The ERA stats tell us that the most number of wins by a team is 95. The overall team salaries indicate that the player salaries are indicative of player quality, ability, and results thereof. The attendance of the fans and public plays a major role in the success of the team. The monies generated from attendance make it possible for owners and management to hire quality talent. Owners and management must be consistent when hiring and managing the players. Team C has concluded through its research that these are the major factors for winning games and championships.

References

NIST/SEMATECH. (2010). e-Handbook of Statistical Methods, retrieved from

http://www.itl.nist.gov/div898/handbook eda/section3/eda35b.htm. Science Buddies. (2010). Summarizing Your Data. Retrieved from

http://www.sciencebuddies.org/science-fair-

projects/project_data_analysis_summarizing_data.shtml

Sekaran, U. (2003). Research Methods For Business: A Skill Building Approach. (4th ed.). John Wiley & Son, Inc. New York, NY. *Histogram and other charts located on attached Excel Spreadsheet*