Mining social well-being using mobile data
Failid
Kuupäev
2023-06-08
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Abstrakt
Mobiilsed andmed, nagu kõneandmete kirjed (CDR) ja digitaalsed andmed, loovad suure hulga andmeid, mis sisaldavad väärtuslikku teavet inimeste käitumise kohta. Käesolevas lõputöös keskendume ühiskonna heaolu kolmele tahule. Esiteks pakume välja kaks mobiilsusepõhise SIR-mudeli versiooni, (i) täielikult segatud ja (ii) keeruliste võrkude jaoks, mis võtavad arvesse CDR-i tegelikke interaktsioone. See töö on inspireeritud eeldusest, et mõne epideemia pandeemiaks muutumise peamine põhjus on globaalne seotus, mis muudab lihtsamaks suurema geograafilise piirkonna, sageli globaalse, mõjutamise. Lisaks ei ole rahvastiku jaotus, inimeste liikuvus ja sotsiaalne sidusus kogu maailmas ühtlane, mis mängib kriitilist rolli. Kasutasime oma mudelit COVID-19 juhtumite prognoosimiseks Eestis ja Prantsusmaal Rhône-Alpes.
Teiseks uurime CDR-andmete abil ühiskondlikku segregatsiooni Eestis. Meie tulemused viitavad sellele, et (i) Eestis esineb sooline segregatsioon ja selle jäljed on nähtavad nii inimeste helistamisaegades, vanuserühmade ühenduvuses, eelistatud suhtluskeeles kui ka maakonnas; (ii) Peamised töötavad isikud (st (25–54) vanuserühm) ja vanurid (s.o (64–100) vanuserühm) on rohkem segregeeritud; (iii) Eesti- ja venekeelsed isikud on keelepõhiselt eraldatud.
Kolmandaks uurime sotsiaal-majanduslike tingimuste (SEC) ennustamiseks mobiilirakenduste (nt Twitter ja Facebook) digitaalseid jälgi. Need tingimused hõlmavad haridust, sugu, vaesust, tööhõivet ja muid tegureid. Seetõttu on usaldusväärne ja täpne teave sotsiaaluuringute ja valitsuse politseitöö jaoks ülioluline. Rakenduste kasutusmustreid kasutades suudab meie parim mudel hinnata majanduslikke, hariduslikke ja demograafilisi näitajaid (saavutades R-ruudu skoori kuni 0,66). Lisaks anname aru nende mudelite seletatavuse kohta, et teha kindlaks prognoosimise olulised tunnused. Avastame, et mobiilirakenduste kasutusmustrid võivad paljastada sotsiaalmajanduslikke erinevusi.
Mobile data such as call data records (CDR), and digital data generate a large volume of data that carries valuable information about people’s behavior. In this thesis, we focus on three facets of societal well-being. First, we propose two versions of the mobility-based SIR model, (i) fully-mixed and (ii) for complex networks, which take into account real-life interactions from CDR. This work is inspired by the assumption that the fundamental cause for some epidemics becoming pandemics is global connectedness, which makes it easier to affect a larger geographical area, often globally. Furthermore, population distribution, people’s mobility, and social coherence are not uniform across the globe which plays a critical role. We also used our model to forecast the COVID-19 cases for Estonia and Rhône-Alpes region in France. Second, we study societal segregation in Estonia using CDR data. Our findings suggest that (i) gender segregation exists in Estonia and its traces are visible in individuals calling hours, connectivity among age-groups, preferred language of communication, and in the county; (ii) The prime working individuals (i.e., (25-54) age-group) and elderly (i.e., (64-100) age-group) are more segregated; (iii) Estonian-speaking and Russian-speaking individuals are segregated based on language. Third, we investigate digital traces from mobile apps (like Twitter and Facebook) to predict socio-economic conditions (SEC). These SEC include education, gender, poverty, employment, and other factors. Therefore, reliable and accurate information is critical for social research and government policing. Using the app's usage patterns, our best model is able to estimate economic, educational, and demographic indicators (attaining an R-squared score up to 0.66). Furthermore, we report on the explainability of these models in order to identify the important features for prediction. We discover that mobile app usage patterns can reveal socio-economic disparities.
Mobile data such as call data records (CDR), and digital data generate a large volume of data that carries valuable information about people’s behavior. In this thesis, we focus on three facets of societal well-being. First, we propose two versions of the mobility-based SIR model, (i) fully-mixed and (ii) for complex networks, which take into account real-life interactions from CDR. This work is inspired by the assumption that the fundamental cause for some epidemics becoming pandemics is global connectedness, which makes it easier to affect a larger geographical area, often globally. Furthermore, population distribution, people’s mobility, and social coherence are not uniform across the globe which plays a critical role. We also used our model to forecast the COVID-19 cases for Estonia and Rhône-Alpes region in France. Second, we study societal segregation in Estonia using CDR data. Our findings suggest that (i) gender segregation exists in Estonia and its traces are visible in individuals calling hours, connectivity among age-groups, preferred language of communication, and in the county; (ii) The prime working individuals (i.e., (25-54) age-group) and elderly (i.e., (64-100) age-group) are more segregated; (iii) Estonian-speaking and Russian-speaking individuals are segregated based on language. Third, we investigate digital traces from mobile apps (like Twitter and Facebook) to predict socio-economic conditions (SEC). These SEC include education, gender, poverty, employment, and other factors. Therefore, reliable and accurate information is critical for social research and government policing. Using the app's usage patterns, our best model is able to estimate economic, educational, and demographic indicators (attaining an R-squared score up to 0.66). Furthermore, we report on the explainability of these models in order to identify the important features for prediction. We discover that mobile app usage patterns can reveal socio-economic disparities.
Kirjeldus
Märksõnad
segregation, social conditions, economic conditions, social welfare, data mining, mobile applications, mobile communication