*** Dataset_Integrated_Usability_Framework_EU_GCC ***
Authors: Fillip Molodtsov (1), Anastasija Nikiforova (1)
	(1) University of Tartu

Corresponding author: Fillip Molodtsov, Anastasija Nikiforova
Contact Information: fillip.molodtsov@ut.ee, nikiforova.anastasija@ut.ee

***General Introduction***
This dataset contains data collected during a study ("An Integrated Usability Framework for Evaluating Open Government Data Portals: 
Comparative Analysis of EU and GCC Countries") conducted by Fillip Molodtsov (University of Tartu) and Anastasija Nikiforova (University of Tartu).

This paper develops an integrated framework for evaluating OGD portal effectiveness that accommodates user diversity (regardless of their data 
literacy and language), evaluates collaboration and participation, and the ability of users to explore and understand the data provided through them. 
The framework is validated by applying it to 33 national portals across European Union (EU) and Gulf Cooperation Council (GCC) countries, 
as a result of which we rank OGD portals, identify some good practices that lower-performing portals can learn from, and common shortcomings.

It being made public both to act as supplementary data for the paper and in order for other researchers to use these data in their own work 
potentially contributing to the improvement of current data ecosystems and develop user-friendly, collaborative, robust, and sustainable open data 
portals.

***Methodology***
To understand wgich frameworks have been used to evaluate OGD portals, we conducted a systematic literature review.
To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS).

These databases were queried for the keywords \textit{((“open data” OR “open government data”) AND portal ) AND 
(usability OR evaluation OR assessment OR "user-cent*" OR analy* OR quality))}, while the search scope was title, abstract, and keywords.
As the field is dynamic, only publications published within the last six years were chosen. Every publication in a language other than English
 was excluded. Finally, the searches were limited to articles, conference papers, and book chapters.

The results from both libraries were combined, duplicates and those without access to their full texts were removed. The title and abstract 
then scanned to determine the relevance of the study. The relevance was determined on a scale of 1 to 4, where 1 meant that the publication
was very relevant, and 4 meant that it was not relevant at all. The publications that received a score of 3 or 4 were excluded from the study.
Three articles were excluded due to unclear portal evaluation criteria based on full text screening. Finally, 82 studies remained for further analysis.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories:
(1) descriptive information, (2) information related to study approach and research design, (3) information related to its quality and relevance, 
and (4) OGD portal assessment-related information.

Based on the information collected, along with selected articles of experts in portal design, notes from the exploratory assessment of the French,
Irish, Estonian and Spanish portals, we developed the Integrated Usability Framework for Evaluating Open Government Data Portals. Predominantly,
a boolean assessment  was used to evaluate the portals, with exceptions present and the described in the framework declaration. Each sub-dimension was 
supplied with its description, scoring criteria and weighing value to ensure common understanding, additional non-mandatory notes were added to provide more context, 
when the boolean assessment ambiguous. This formed a protocol to be fulfilled on every of those 34 national portals of the EU and GCC countries.

Each portal was evaluated by an expert where a person is considered to be an expert if a person has expertise in computer science and information 
systems, works with open (government) data and data portals daily, meeting the expert profile according to the derivation of the International 
Certification of Digital Literacy (ICDL) proposed in Lněnička et al. (2021) is expected to be met.

When all individual protocols were collected, the total score are calculated using the weighing system. The average scores are calculated for the
EU and GCC. The portals are ranked. The top portals (best performers) are determined for each dimension.

Based on the score matrix, two types of clustering analysis (K-means clustering and hierarchical clustering) are carried out, which group similar
portals together based on their sub-dimension performance.  Employing clustering analysis on a portal score matrix enables a deeper understanding 
of the relationships and patterns among different portals based on their performance metrics. By clustering portals with similar scores, groups 
exhibiting similar behavior or functionality can be identified, thus providing insights into the overall landscape of OGD portals.
By calculating the average dimensional scores of portals from both types of clusters, their performance across multiple dimensions is evaluated.

***Test procedure***
(1) perform an assessment of each dimension using sub-dimensions, mapping out the achievement of each indicator, make qualitative notes
(2) calculate the total score for each portal
(3) calculate the average scores for the EU and GCC
(4) rank the portals
(5) perform clustering analysis
(6) evaluate the performance of the clusters by calculating the average dimensional scores of portals from both types of clusters

***Description of the data in this data set*** 
clustering-code-listing provides the code listing for cluster generation.

SLR provides:

    Sheet#1 "Structure of SLR protocol"
        Column A: Category
        Column B: Metadata (Sub-category)
        Column C: Description

    Sheet#2 "SLR data extraction"
        Column A-P: Sub-categories presented in Sheet#1

Framework provides:

    Sheet#1 "Framework"
        Column A: Dimension
        Column B: Sub-dimension
        Column C: Description
        Column D: Scoring criteria
        Column E: Weight

Assessment provides:

    Sheet#1 "Accessibility testing"
        Column A: Country
        Column B: Portal URL
        Column C: Accessibility testing results (score)
        Column D: List of displayed critical issues
    
    Sheet#2 "Dataset sample assessment"
        Column A: Country
        Column B: Portal URL
        Column C-J: Data understandability (dimension of the framework, Sheet#1 in Framework) sub-dimensions that have sample-based assessment
        Column K-O: Data quality (dimension of the framework, Sheet#1 in Framework) sub-dimensions that have sample-based assessment
        Column P-Q: Data findability (dimension of the framework, Sheet#1 in Framework) sub-dimensions that have sample-based assessment

    Sheet#3 "Score matrix"
        Column A: Country
        Column B: Portal URL
        Column C: Category of the row: GCC(0), EU (1), Calculated (2)
        Column D-BW: Score for each sub-dimension (see Sheet#1 in Framework)
        Column BY-CG: Sum score for each dimension (see Sheet#1 in Framework)
        Column CH: Total score for the row

    Sheet#4 "Charts (total, by dimension)"
        Provided 10 ranks (tables) of the portals based on the total score and by dimension
        Column 1: Country
        Column 2: Category (see Column C in Sheet#3)
        Column 3: Score

    Sheet#5 "Clusters"
        Provided 2 tables of the clusters (4 clusters) based on K-means clustering and hierarchical clustering
        Column 1: Country
        Column 2: Cluster number

    Sheet#6 "Clusters' comparison"
        Provided tables with the sum scores for each dimension for each portal in each cluster. Average cluster scores are calculated.
        Based on the greatest number of common portals, the clusters from different types (K-means and hierarchical) are tagged with the same color.
        The average scores by dimension of the clusters with the same color-tag are calculated and written in the top-most table.

For more details on the framework and results see the paper "An Integrated Usability Framework for Evaluating Open Government Data Portals: 
Comparative Analysis of EU and GCC Countries".

***Format of the file***
.xls, .docx, .ipynb

***Licenses or restrictions***
CC-BY