Open table

Available tables

Anonymization method:

Mehtod parameter :
(% of attribute value)

Preview of selected tables

None selected.

Getting started

This view allows you to choose one or more data tables to work with and anonymyztion method. To complete this step, do the following:

  1. Choose the table(s) which you would like to anonymize. You can select table by checking the checkbox in front of table name.
    • Once you select the table you will immediately see the preview of this table under Preview of selected tables. This will give you an idea of how table looks like.
  2. Choose the anonymization method.
    • Please note that when you select more than one table, only MultiRelational k-anonymity is available. This limitation is due to fact that other other anonymization methods would either offer insufficient protection to privacy or they would over generalize the data, making it useless for data miners.
  3. Click on Next button.

About different anonymization methods.

In this dialog you will be asked to choose anonymization method. Each method work a little bit differently and has a different purpose. This paragraph will give you brief overview of these methods.

k-Anonymity

The purpose of k-Anonymity is to protect individuals against record linkage attacks by anonymizing attribues in QID. The point of k-Anonymity is generalize data in QID columns to that for each row with some QID values there would be at least k-1 other rows with that same QID. k-Anonymity works in situations where each person is present only in one row. For example if QID = {Age, Gender, City} then the table below is 2-Anonymous because the smallest QID group has size of 2.

Age Gender City Disability
[35-40) Male Elva No
[35-40) Male Elva Yes
[35-40) Male Tartu No
[40-45) Female Tartu Yes
[40-45) Female Tartu No

(X, Y)-Anonymity

The purpose of (X, Y)-Anonymity is to protect individuals against record linkage attacks. In cases where one individual appears in multiple rows k-Anonymity would not offer sufficient protection. For example if hospital releases data about diaseses, person who has multiple diaseses would be present in multiple rows (one row per diasese). For example, if each person has 3 different diaseses it means that there are 3 rows for each person and therefore QID group of size k would only offer anonymity of k/3 instead of k. The following table is illustration of this.

Id Age Gender City Diasese
1 33 Male Tartu Hiv
1 33 Male Tartu Flu
1 33 Male Tartu Aids
2 35 Female Elva Hiv
2 35 Female Elva Aids

This table above would be 2-Anonymous on QID = {Age, Gender, Elva} but in reality in first QID group all 3 rows represent the same person who has 3 diaseses and in second group both 2 rows represent the same person who has two diaseses. (X, Y)-Anonymity overcomes this problem by making requirement that each element from group X has at least k elements from group Y. For example: require that each QID group represents at least k different persons.

MultiRelational k-anonymity

MultiRelational k-anonymity is also for protecting against record linkage. In more realistic situations, data is stored in relational database rather than one single table that contains all the data. Usually, it would be a person specific table (PT) and one table T for each sensitive attribute. In such situations other anonymization methods would either not provide sufficient protection for privacy or they could make data useless for dataminers by over generalizing it. Such tables have common identifier which can be used to link persons with their attributes. For example PT = {Id, Name, Age, Gender, City} and T1= {PersonId, Job}. In tabe T1 PersonId matches Id in PT so these table can be joined on PT.Id = T1.PersonId which would result in T = {Id, Name, Age, Gender, City, Job}. After joining those tables together into table T, k-Anonymity is applied to T.

l-Diversity

L-diversity is proposed by Machanavajjhala to prevent attribute linkage attack. L-diversity requires that each QID group has at least l different sensitive attribute values. L-diversity differs from k-anonymity for that while k-anonymity requires group to contain at least k individuals with same QID, l-diversity requires group to contain at least l different sensitive attributes. For example with k-anonymity it is possible that QID group has size of k but all sensitive values are same. This means that when attacker knows QID of victim he can guess sensitive attribute of victim with probability of 1/1. When l-Diversity is required, attacker would be able to correctly guess sensitive attribute value of victim with probability of 1/l.

t-Closeness

Sometimes l-diversity may not guarantee enough protection. For example in table there could be a table where 90% of people work as designer and 10% work as programmer. Suppose that there is some QID group where 50% of sensitive attributes are designer and 50% of sensitive values are programmer. In that case person could be infered to be programmer with 50% of confidence. T-closeness overcomes this weakness by requiring that the distribution of sensitive attributes QID would be close to distribution of sensitive attributes in table.

ε-Differential privacy

ε-Differential privacy requires that threat to record owner's privacy should not significantly increase as a result of prarticipating in data relesase. ε-Differential privacy achieved by shifting sensitive attribute by ε.

Step 1