Available tables |
Anonymization method:
Mehtod parameter :
(% of attribute value) |
This view allows you to choose one or more data tables to work with and anonymyztion method. To complete this step, do the following:
Preview
of selected tables. This will give you an idea of how table looks like.
MultiRelational
k-anonymity is available. This limitation is due to fact that other other anonymization
methods would either offer insufficient protection to privacy or they would over generalize the
data,
making it useless for data miners.
Next button.In this dialog you will be asked to choose anonymization method. Each method work a little bit differently and has a different purpose. This paragraph will give you brief overview of these methods.
The purpose of k-Anonymity is to protect individuals against record linkage attacks by anonymizing attribues
in QID. The point of k-Anonymity is generalize data in QID columns to that for each row with some QID values
there would be at least k-1 other rows with that same QID. k-Anonymity works in situations where each person
is present only in one row. For example if QID = {Age, Gender, City} then the table below is
2-Anonymous because the smallest QID group has size of 2.
| Age | Gender | City | Disability |
|---|---|---|---|
| [35-40) | Male | Elva | No |
| [35-40) | Male | Elva | Yes |
| [35-40) | Male | Tartu | No |
| [40-45) | Female | Tartu | Yes |
| [40-45) | Female | Tartu | No |
The purpose of (X, Y)-Anonymity is to protect individuals against record linkage attacks. In cases where one individual appears in multiple rows k-Anonymity would not offer sufficient protection. For example if hospital releases data about diaseses, person who has multiple diaseses would be present in multiple rows (one row per diasese). For example, if each person has 3 different diaseses it means that there are 3 rows for each person and therefore QID group of size k would only offer anonymity of k/3 instead of k. The following table is illustration of this.
| Id | Age | Gender | City | Diasese |
|---|---|---|---|---|
| 1 | 33 | Male | Tartu | Hiv |
| 1 | 33 | Male | Tartu | Flu |
| 1 | 33 | Male | Tartu | Aids |
| 2 | 35 | Female | Elva | Hiv |
| 2 | 35 | Female | Elva | Aids |
This table above would be 2-Anonymous on QID = {Age, Gender, Elva} but in reality in first QID
group all 3 rows represent the same person who has 3 diaseses and in second group both 2 rows represent the
same person who has two diaseses. (X, Y)-Anonymity overcomes this problem by making requirement that each
element from group X has at least k elements from group Y. For example: require that each QID group
represents at least k different persons.
MultiRelational k-anonymity is also for protecting against record linkage. In more realistic situations, data is stored in relational database rather than one single table that contains all the data. Usually, it would be a person specific table (PT) and one table T for each sensitive
attribute. In such situations other anonymization methods would either not provide sufficient protection for privacy or they could make data useless for dataminers by over generalizing it. Such tables have common identifier which can be used to link persons with their attributes. For
example PT = {Id, Name, Age, Gender, City} and T1= {PersonId, Job}. In tabe T1 PersonId matches Id in PT so these table can be joined on PT.Id = T1.PersonId which would result in T = {Id, Name, Age, Gender, City, Job}. After joining those tables together into table T, k-Anonymity is applied to T.
L-diversity is proposed by Machanavajjhala to prevent attribute linkage attack. L-diversity requires that each QID group has at least l different sensitive attribute values. L-diversity differs from k-anonymity for that while k-anonymity requires group to contain at least k individuals with same QID, l-diversity requires group to contain at least l different sensitive attributes. For example with k-anonymity it is possible that QID group has size of k but all sensitive values are same. This means that when attacker knows QID of victim he can guess sensitive attribute of victim with probability of 1/1. When l-Diversity is required, attacker would be able to correctly guess sensitive attribute value of victim with probability of 1/l.
Sometimes l-diversity may not guarantee enough protection. For example in table there could be a table where 90% of people work as designer and 10% work as programmer. Suppose that there is some QID group where 50% of sensitive attributes are designer and 50% of sensitive values are programmer. In that case person could be infered to be programmer with 50% of confidence. T-closeness overcomes this weakness by requiring that the distribution of sensitive attributes QID would be close to distribution of sensitive attributes in table.
ε-Differential privacy requires that threat to record owner's privacy should not significantly increase as a result of prarticipating in data relesase. ε-Differential privacy achieved by shifting sensitive attribute by ε.