A General Introduction to the Use Of Data Mining in the Game Industry With a Focus On Game Result Forecast

The data mining in the game is mainly to establish a user churn rate analysis model, to make an early warning of the possible loss in the future. In addition, this analysis process may be used for the game or more specifically the prediction of game results. The process of game data mining is mainly divided into data preparation, data mining stage and modeling. At the same time, they can use the model to guide the daily operation to release relevant data analysis reports for players. The literature review reports on the main academic articles related to data mining in the game industry with a focus on game result forecast.

Game Data Understanding and Data Preparation

Data mining relies heavily on data. Valuable data should be all the data generated by the player’s various behaviors – not just dynamic data in the game. In order to achieve the goal of data mining, the data collected is required to be sufficient and the quality is as high as possible.  Technological advances results in the easy collection of an immense mass of data on everything(Drachen, 2013). For details on data understanding and preparation, chapter 4 of Handbook of Statistical Analysis & Data Mining Applications is useful. After data mining process is certain, the following step is to access, extract, integrate, and prepare the appropriate data set for data mining(Nisbet, Elder, & Miner, 2018).  After the original data is collected, the data needs to be described and processed, for example, to further explore whether there is a relationship between the selected variable and the target variable, and the basic exploration of each variable data, such as the number of null values, the number of unique values, and the minimum maximum value etc. However, the range and definition of various data will vary from company to company and from research purpose. For example, the previous game companies mostly pay attention to the maximum number of online games (PCU), but now more than the active users of the game while the number of active users has different definitions depending on how the game operates.


Krzysztof Trawinski(2000) used a preliminary approach to use a fuzzy model to predict basketball game results. Usually the modeling of game prediction goes through three stages: building the model, testing and adjusting the model, and applying the model. To build a model, the literature shows that it is key to choose the appropriate modeling technology. At each stage of game operation, the data is different and the operation and methods are different. This makes it possible to use the combination of models and adopt different models for each game stage. A study by Pınar Tüfekci shows that reduced dataset is better than using the original one( Tüfekci. 2016).

Model evaluation

After the model meets the business objectives by evaluation, it enters the model release phase. The application of the model usually requires a longer period of testing to accurately assess whether it meets commercial standards. The dataset should be modified so that the modeling methods are best supported with less bias(Berthold, 2010). In addition, for the design of the model, it is not only to assess the accuracy and versatility of the model, but also to compare the generated results with the criteria set by the model according to the current situation to make appropriate corrections to the data warehouse variables to meet the needs of daily data analysis. It is found that a combination of data mining and Game Theory fail to provide a reasonable solutions for the representation and structuring of the knowledge in a game(Wang, 2007).