Estimation of domains under restrictions built upon generalized regression and synthetic estimators
Date
2011-07-22
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Tänapäeval on nõudmine usaldusväärse statistika järele oluliselt kasvanud. Seejuures vajatakse näitajaid üha detailsemal tasemel - mitmesuguste osakogumite tasemel. Vajalikke näitajaid saadakse nii valikuuringutest kui ka erinevatest registritest. Registrite korral võib juhtuda, et sealt puuduvad huvipakkuvate osakogumite identifikaatorid, mistõttu ei saa osakogumite parameetreid sealt arvutada. Teisalt võib huvipakkuv tunnus olla küsitletud valikuuringus ja seda koos osakogumi identifikaatoritega. Võimalus leida osakogumi hinnangud valikuuringust tekitab kooskõlalisuse probleemi: valikuuringust saadud hinnangud ei summeeru üldkogumi või vastavate suuremate osakogumite summadeks, mis on välja võetud registrist.
Antud dissertatsiooni põhiteemaks on osakogumite hinnangute väljatöötamine, mis on kooskõlalised ja parema täpsusega võrreldes esialgsete hinnangutega. Seni uuritud kitsendustega hinnang (General Restriction estimator, lühidalt GR) baseerub nihketa esialgsetele hinnangutele ja rahuldab lineaarseid kitsendusi. Kuid osakogumite hindamiseks kasutatakse sageli hinnanguid, mis võivad omada nihet.
Töös lubame nii nihketa kui ka nihkega lähtehinnanguid. Lähtehinnangute rolli on valitud üldistatud regressioon- (Generalized Regression, lühidalt GREG) ja sünteetiline (SYN) hinnang. Mõlema hinnangu konstrueerimiseks osakogumites kasutame kahte mudelit, üldkogumitaseme ehk P-mudelit ja osakogumitaseme ehk D-mudelit.
Töös pakutakse välja kolm uut GR-hinnangut ja näidatakse, et nende ruutkeskmised vead on väiksemad kui lähtehinnangu oma. GR-hinnangute hulgas leitakse ka parim hinnang. Samuti uuritakse lähtehinnanguteks valitud GREG ja SYN hinnangute omadusi.
Teoreetilised tulemused on illustreeritud simuleerimisülesandes reaalsete andmete põhjal ja on veendutud tulemuste rakendatavuses.
Nowadays, demand on accurate statistics of population sub-groups or domains increases. This statistics can be obtained from surveys, or, sometimes, aggregated from registers. It may happen that even if the register contains variables under interest, it does not contain identifies of the domains under our particular interest. As follows, these domain totals can not be produced from that register, they need to be estimated from a survey. The survey has to collect information on the same study variable but together with domain identifiers. As a result, the consistency problem occurs, the domain estimates from the survey do not sum up to the totals available from the registers. In this thesis we concentrate on the estimation of the domain and the population totals under summation restriction. We allow biased as well unbiased initial estimators for domains. Based on them, we construct three new estimators (the general restriction, GR estimators) that satisfy summation restriction. The classes of initial estimators for the GR estimator are chosen to be the generalized regression (GREG) family, and the family of synthetic (SYN) estimators. Both estimators the GREG and the SYN for domains are constructed under different model specifications. The properties of the proposed GR estimators (the bias and the mean square error matrix) are worked out in this thesis, and the best GR estimator is also found out. Superiority of the new GR estimators over initial estimators is shown. Besides the GR estimators, properties of the GREG and the SYN domain estimators are also studied, they are the building blocks for the GR estimators. All estimators are developed in general level, valid for all sampling designs. Theoretical results are illustrated and confirmed in a simulation study on real data.
Nowadays, demand on accurate statistics of population sub-groups or domains increases. This statistics can be obtained from surveys, or, sometimes, aggregated from registers. It may happen that even if the register contains variables under interest, it does not contain identifies of the domains under our particular interest. As follows, these domain totals can not be produced from that register, they need to be estimated from a survey. The survey has to collect information on the same study variable but together with domain identifiers. As a result, the consistency problem occurs, the domain estimates from the survey do not sum up to the totals available from the registers. In this thesis we concentrate on the estimation of the domain and the population totals under summation restriction. We allow biased as well unbiased initial estimators for domains. Based on them, we construct three new estimators (the general restriction, GR estimators) that satisfy summation restriction. The classes of initial estimators for the GR estimator are chosen to be the generalized regression (GREG) family, and the family of synthetic (SYN) estimators. Both estimators the GREG and the SYN for domains are constructed under different model specifications. The properties of the proposed GR estimators (the bias and the mean square error matrix) are worked out in this thesis, and the best GR estimator is also found out. Superiority of the new GR estimators over initial estimators is shown. Besides the GR estimators, properties of the GREG and the SYN domain estimators are also studied, they are the building blocks for the GR estimators. All estimators are developed in general level, valid for all sampling designs. Theoretical results are illustrated and confirmed in a simulation study on real data.
Description
Keywords
matemaatiline statistika, statistilised valimid, osakogumid, hinnangud, mathematical statistics, statistical samples, subsamples, estimators