Ethics in AI:
The Black Box Dilemma
Artificial Intelligence has ceased to be a futuristic promise to become the engine of critical decisions: from loan approvals to medical diagnoses. However, as models become more complex, we lose the ability to explain *why* they take those decisions. This phenomenon, known as "Black Box", poses immense ethical risks.
The handling of sensitive data in the training of these models is the current battleground. If we feed an AI with biased historical data, the AI will not only learn those biases, but will amplify and automate discrimination at an unprecedented scale.
Sanitization and Real Anonymization
Removing the name and email from a database is no longer sufficient. With current correlation capabilities, re-identification is trivial. At Primitive, we apply Differential Privacy techniques, injecting statistical noise into datasets so that it is mathematically impossible to trace a piece of data back to a specific individual, without losing the utility of the global pattern.
Companies must audit their datasets before they touch a single GPU. Transparency in data origin is not optional. Under the new EU AI Act, providers of general-purpose models must demonstrate that they respect copyright and the privacy of training data.
“An ethical AI is not one that only obeys commands, but one that protects the dignity of the data with which it was trained.”
Zenith Privacy
The technical implementation of ethics implies secure data pipelines. Below, we show a conceptual example of how to anonimize a dataframe before using it for training, using hashing techniques with "salt" to protect identifiers.
Anonymization Pipeline in Python
This basic script demonstrates how to transform PII data (Personally Identifiable Information) into opaque identifiers before processing.
Data Sanitization Process:
- Direct removal of unnecessary fields.
- Cryptographic hashing of user IDs.
- Generalization of demographic data (e.g., age by ranges).
- Suppression of outliers that facilitate identification.
import pandas as pd
import hashlib
def hash_pii(value, salt="S3cr3tS@lt"):
"""Irreversible hash for user IDs"""
return hashlib.sha256((str(value) + salt).encode()).hexdigest()
def sanitize_dataset(df):
# 1. Remove real names
df = df.drop(columns=['full_name', 'email', 'address'])
# 2. Hash IDs
df['user_id'] = df['user_id'].apply(hash_pii)
# 3. Generalize Age (k-anonymity)
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 30, 50, 100],
labels=['0-18', '19-30', '31-50', '50+'])
df = df.drop(columns=['age'])
return df
# df_clean = sanitize_dataset(df_raw)
Ethics in AI is not a brake on innovation, it is the safety belt that allows us to go faster without crashing. Building fair and private systems is the only way to ensure the viability of AI in the long term.
Zenith Privacy
Lead Threat HunterInvestigating advanced threats and securing critical infrastructures. Obsessed with proactive defense.
English
Español