Embedding Inversion Attacks

Definition

[IDE.T.1] An Embedding Inversion Attack exploits vulnerabilities to invert embeddings and recover significant amounts of source information, compromising data confidentiality.

Targeted assets

System Asset: Supporting IT infrastructure.

Business Asset: input data embeddings.

Security Criteria: confidentiality.

Attack details

Exploited vulnerabilities

Vulnerabilities:

  1. Embedding vectors retains enough information to enable reconstruction of significant parts of the original text.
  2. The leakage or unauthorized access to the embeddings.

Threat agent

Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.

Attack methods

Attack methods:

  1. Inversion of embeddings, leading to recovery of source information. Utilize gradient-based (white-box) or learning-based (black-box) methods to invert the target embeddings. The produced mappings will partially reveal the original input to the model.

Impact and harm

Impact and harm: Negates the confidentiality of previously provided input to the machine learning system, by extension model's confidentiality is compromised in addition. This may lead to legal repercussions.

Security countermeasures

Security requirements

Security requirement: The machine learning system's actions and decisions must be resistant to embedding inversion attacks.

Security controls

Security controls:

  1. Implement permission and access control to the embedding store.
  2. Monitor and log the data retrieval activities.
  3. Audit and validate integrity of the data stores.
  4. Operate with data retrieved only from the trusted sources.
  5. Monitor an devaluate the influence of RAG on the model's performance.