Mälusäästlik kiire ligikaudne lühima tee otsing suurtes graafides
Date
2013
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Lühima tee otsing on üks olulisematest graafi algoritmidest. Suurte graafide korral tihti tekib vajadus kasutada selleks aga ligikaudseid meetodeid, kuna täpsed algoritmid on talumatult aeglased. Üks populaarne, lihtne, ning hästi skaleeruv ligikaudsete lühima tee otsimise meetodite pere põhineb orientiiride (landmarks) ideel. Nimelt kui ette arvutada kaugusi igast tipust x ühte väljavalitud orientir-tippu u, saab iga tipu s ja t vahelise kauguse lähendada kasutades kolmnurga võrratust:
d(s,t)< = d(s,u) + d(u,t).
Tulemuse täpsust saab suurendada, suurendades kasutatavate orientiirtippude arvu. Sel juhul tuleb valida k erinevat orientiiri ning arvutada ette kaugused igast tipust igasse orientiiri.
Käesolevas töös me tutvustame lihtsat, kuid võimsat modifikatsiooni sellele lähenemisele, mida nimetame pügatud orientiiride puuks. Modifikatsiooni idee baseerub sellel faktil, et enamasti piisab salvestada mitte kõik kaugused sõlmest kõigesse orientiiridesse, vaid ainult r lähima orientiirini (kus r võib olla kõvasti väiksem kui k).
Pakutud meetodite lähendamise täpsust ning kiirust testisime suurte sotsiaalvõrkude graafide peal: DBLP, Orkut, Twitter ja Skype. Saadud tulemused olid võrreldud traditsiooniliste orientiiridel baseeruvate algoritmite tulemustega. Võrdlus näitas, et pakutud lahendus tõepoolest lubab märgatavalt vähendada algoritmide poolt kasutatava mälu, jättes täpsust ja päringu täitmise aega suuresti samasuguseks.
Shortest path computation is one of the most critical primitives in graph algorithms. In large graphs there is often a need to use approximate methods as simple exact algorithms require unacceptably large amounts of time. One family of techniques that is simple and scalable at the same time is based on upper bound distance approximation using a fixed set of selected nodes called landmarks. If we know the distances from all nodes in the graph to a landmark u, the distance between a pair of nodes s and t can be approximated using the triangle inequality: d (s, t)< = d (s, u) + d (u, t). In a similar way it is also possible to calculate approximated shortest paths. The obtained accuracy can be increased by using a set of k landmarks, but this leads to linear increase of memory usage and preprocessing time with sublinear approximation error reduction. In this work we introduce a simple but powerful modification to this approach that we call pruned landmark trees. The idea of this improvement is based on the fact that in majority of cases it is sufficient to keep the distances only to the r closest landmarks rather than the whole set of landmarks, where r is much smaller than k. We describe three shortest path approximation algorithms that use the proposed modification. The performance of the presented methods was tested on big real-world social network graphs including DBLP, Orkut, Twitter and Skype. The obtained results have been compared against the results of regular landmark-based techniques. The comparison showed that pruned landmark tree-based algorithms can be used to significantly reduce the used memory consumption while achieving both comparable accuracy and query execution time.
Shortest path computation is one of the most critical primitives in graph algorithms. In large graphs there is often a need to use approximate methods as simple exact algorithms require unacceptably large amounts of time. One family of techniques that is simple and scalable at the same time is based on upper bound distance approximation using a fixed set of selected nodes called landmarks. If we know the distances from all nodes in the graph to a landmark u, the distance between a pair of nodes s and t can be approximated using the triangle inequality: d (s, t)< = d (s, u) + d (u, t). In a similar way it is also possible to calculate approximated shortest paths. The obtained accuracy can be increased by using a set of k landmarks, but this leads to linear increase of memory usage and preprocessing time with sublinear approximation error reduction. In this work we introduce a simple but powerful modification to this approach that we call pruned landmark trees. The idea of this improvement is based on the fact that in majority of cases it is sufficient to keep the distances only to the r closest landmarks rather than the whole set of landmarks, where r is much smaller than k. We describe three shortest path approximation algorithms that use the proposed modification. The performance of the presented methods was tested on big real-world social network graphs including DBLP, Orkut, Twitter and Skype. The obtained results have been compared against the results of regular landmark-based techniques. The comparison showed that pruned landmark tree-based algorithms can be used to significantly reduce the used memory consumption while achieving both comparable accuracy and query execution time.