Thursday, March 5, 2020

Predicting MLB Attendance: Exploration

Attendance at Major League Baseball games has decreased for 12 straight years. Theories for the decline are as numerous as they are varied, such as an increase in game time, an aging fan base, less competition among teams, higher costs associated with attending games, a changes in game play, more entertainment options, and even the weather. Building a model to predict attendance can give insights into what actually matters when it comes to attendance.

Using game day attendance figures from the past ten MLB seasons (2010-2019) can provide a wealth of data to examine the important of individual factors in a given game's attendance. Theoretically, there should be 24,300 (81 home games, 30 teams, 10 years) individual regular season games to work with over that time span plus a few one-game playoffs. However, for predictive purposes rescheduled games, one-game playoffs, games moved to one-off circumstances (such as hurricanes, G20 Summits, riots, etc.), and games held in special venues (College World Series, Little League World Series, Mexico, Puerto Rico, Japan etc.) will be excluded, reducing the number of games to examine at 23,814.

To get a general idea of how attendance is influenced by different variables, exploration will be limited to the team with the largest ballpark seating capacity: the Los Angeles Dodgers. The Dodgers have a maximum seating capacity of 56,000. The average attendance at Dodger Stadium has been relatively consistent since 2013, which is the year after their notoriously bad owner, Frank McCourt, sold the team. Using the seasons from 2013-2019 should remove any confounding variables that might be present for any other given team, as the Dodgers have been won the division each of those seasons and have not had any major changes to their payroll, stadium, or ownership situations over that time period.

It is not surprising to see that the Dodgers have a higher average attendance on weekends than they do on weekdays. With an average attendance of 50,028, Saturdays are by far the most popular day of week to attend games, 3,214 more fans than the average game. Monday and Wednesday games see the largest drops in attendance, with decreases of 2,963 and 2,036 relative to average, respectively. However, it does seem surprising that Tuesdays do not see a more substantial decrease in attendance. 



The attendance appears to be greatest when the temperature is just below 80 °F and it tends to decrease when the temperature is in 60s or in the high 80s or above. Perhaps temperature is a proxy for time of year, given that April through June are usually the colder months of the baseball season in Los Angeles and individuals may be less likely to go to games in the beginning of the season when the season is still young and families still have children in school. 



There appears to be a high interest in games at the very beginning of the season followed by a quick drop. This may be caused by interest in opening day, as the first home game of the season has an average attendance of 53,459, while all other games have an average attendance of 46,775. Attendance increases in the beginning and middle of June, which would correspond with the school year ending. There appears to be a modest drop in attendance beginning in the middle of August, which is when the school year begins again; however, this drop-off is not as substantial as the increase in June, so perhaps both weather and the time of year are both factors in attendance. 

It does appear that both temperature and time of year impact attendance. The increase in attendance in June occurs before the increase in temperature. Likewise, the attendance decreases once the school year begins while temperature stays relatively high; however, the attendance is still higher than it is at the beginning of the season.


It goes without saying that certain teams have larger fan bases and have more of a nationwide following. It is not surprising that the New York Yankees highest change in attendance when they are the away team, but it is surprising how much more attendance increases when they are the away team compared to the second most popular team: the Boston Red Sox. On average, when the Yankees were the away team the attendance of the game increased by 21%, while the Red Sox only saw an increase of 11%. It is surprising to see that only a third (ten) of teams saw an increase in attendance as the away team and it would be fair to say that popular teams are more of a draw than less popular teams are a deterrence. Every team except the Angels that saw an increase in attendance as the road team has played in the World Series since 2009. That said, it does not necessarily mean that team success is indicative of higher attendance for away games, as teams with larger followings naturally have more resources. 

The Atlanta Braves rank thirtieth out of the thirty teams in change in attendance as the road team, with an average change of -1.5%. They started the decade as a good team and made the playoffs in 2010, 2012, and 2013 then in the middle of the decade they "tanked" and posted losing records in four straight seasons, and to close out the decade they won their division in 2018 and 2019. As a result of their varying success this decade, they might be worth examining to get an idea of team quality on the difference in road attendance. As the road team they had a positive affect on attendance through 2014, even though they posted a losing record that year. For the remaining five years of the decade they had a negative affect on attendance. It appears that there may be a lagging affect of team record on attendance, as in 2018 they posted a .556 winning percentage, yet had their largest negative affect on attendance with -12.7%. Overall, it does appear that recent success does have some impact on attendance as the road team.

While this exploration is not conclusive as to the exact factors of attendance at Major League Baseball games, it does provide a good starting point before creating any models to predict attendance. It appears that the day of the week, temperature, time of year, and the away team all have an impact on the attendance of any given game. 


The code used for this analysis can be accessed here: https://github.com/pfmccull/MLB-Attendance/tree/master/Exploration

No comments:

Post a Comment

Best First Wordle Guess

In Wordle there are 2,315 words that are valid answer to the puzzle and there are 10,657 words that are valid guesses that are not potential...