Predicting company innovativeness by analysing the website data of firms: a comparison across different types of innovation


This paper investigates which of the core types of innovation can be best predicted based on the website data of firms. In particular, we focus on four distinct key standard types of innovation – product, process, organisational, and marketing innovation in firms. Web-mining of textual data on the websites of firms from Estonia combined with the application of artificial intelligence (AI) methods turned out to be a suitable approach to predict firm-level innovation indicators. The key novel addition to the existing literature is the finding that web-mining is more applicable to predicting marketing innovation than predicting the other three core types of innovation. As AI based models are often black-box in nature, for transparency, we use an explainable AI approach (SHAP - SHapley Additive exPlanations), where we look at the most important words predicting a particular type of innovation. Our models confirm that the marketing innovation indicator from survey data was clearly related to marketing-related terms on the firms' websites. In contrast, the results on the relevant words on websites for other innovation indicators were much less clear. Our analysis concludes that the effectiveness of web-scraping and web-text-based AI approaches in predicting cost-effective, granular and timely firm-level innovation indicators varies according to the type of innovation considered.



innovation, marketing innovation, community innovation survey (CIS),, machine learning, neural network, explainable AI, SHAP