Modelling late invoice payment times using survival analysis and random forests techniques



Journal Title

Journal ISSN

Volume Title



The aim of this thesis is to explore possibilities of modelling late payment times of invoices in business-to-business sales process using real data of sales ledgers. Survival analysis and a novel ensemble method of Random Survival Forests is applied to the right-censored data of late invoices. A theoretical overview of Random Survival Forests is given and concordance index as a performance measure for survival models is explained. A comprehensive overview of data preprocessing and deriving payment times from sales ledgers is presented. We propose two separate models, for first-time debtors and for repeated debtors, and explore the effect of different predictors in a model. Random Survival Forests prove to have advantages over Cox Proportional Hazards model as there are no underlying assumptions that need to be taken into consideration. Overall, it is concluded that Random Survival Forests model which additionally uses historical payment behaviour of debtors, performs the best in ranking payment times of late invoices.



survival analysis, machine learning, random survival forests, late invoices, sales ledger, censoring, elukestusanalüüs, masinõpe, juhuslikud elukestusmetsad, ületähtaegsed arved, müügireskontro, tsenseerimine