Python Data Anonymisation Pipeline

Anonymisation Pipeline

Follow the pipeline from raw CSV input through anonymisation techniques to secure output. Click "Run Pipeline" to execute each step and see the transformation.

Raw Dataset

CSV Input

→

Pseudo-IDs

uuid.uuid4()

→

Text Shifting

ord(char) + 2

→

Data Masking

***masked***

→

Generalisation

age → range

→

Drop Columns

.drop()

→

Export

.to_csv()

Step 1: Raw Dataset (CSV Input)

Start with a sample dataset or paste your own CSV. This simulates loading a raw dataset with personally identifiable information (PII).

RAW Original Dataset

        Python
        import pandas as pd

        # Load raw dataset

        df = pd.read_csv('raw_data.csv')

        print(df.head())

Step 2: Select Anonymisation Techniques

Choose which techniques to apply to the dataset. Each technique targets different types of PII with different Python implementations.

Pseudo-Identifiers

Replace real IDs with random UUIDs. Creates randomised tracking sequences mapped via pd.Series.

df['id'] = [str(uuid.uuid4()) for _ in range(len(df))]

ASCII Text Shifting

Shift each character by +2 in ASCII. Algorithmically disguises text ('AA' becomes 'CC'). Reversible with the key.

shifted = ''.join(chr(ord(c) + 2) for c in text)

Data Masking

Replace sensitive fields with masked values. Shows partial data (e.g., email: j***@***.com).

masked = value[0] + '***' + value[-4:]

Generalisation

Reduce precision of numeric data. Exact ages become ranges, salaries become brackets.

df['age_range'] = pd.cut(df['age'], bins=[0,25,35,45,55,65,100])

Drop Identity Columns

Remove columns that directly identify individuals (name, SSN, full address).

df.drop(columns=['name', 'ssn', 'address'], inplace=True)

Python Data Anonymisation Pipeline

Anonymisation Pipeline

Step 1: Raw Dataset (CSV Input)

Step 2: Select Anonymisation Techniques

Step 3: Pipeline Execution

Export Anonymised Dataset