Prediction of a movie’s box office using pre-release data



Journal Title

Journal ISSN

Volume Title


Tartu Ülikool


It’s difficult to overestimate the impact of the film industry in our lives, it expands our knowledge about the world and culture and entertains. Going to the cinema has become an important leisure activity. Moreover, the total worldwide box office in 2018 hit a significant amount of $41B. This is not surprising as only in 2018 there were released 11,911 feature-length films worldwide. The box office generated from cinema ticket sales is the main source of profit for widely released movies. However, not all movies are successful in terms of profit when the cost of production is compared with the total box office. 78% of movies released worldwide are not profitable and 35% of profitable movies earn 80% of the total profit. Seeing the importance of theatrical screenplays and tough competition for the profit made, we want to be able to predict how successful a movie is going to be and whether it is worth taking the risk of investment. Only pre-release available data is used to be able to make a prediction at the earliest stages. We went through several stages typical for data mining and machine learning to obtain possibly the biggest and feature-rich dataset used in box office gross prediction. We use neural networks and gradient boosting machines to be able to predict the absolute box office gross, predict within which range it is likely to be, and whether a movie will be profitable, and the results obtained are very competitive in the domain.



Regression, Classification, Motion pictures, Box office, Neural networks, LightGBM