Ensuring Privacy - TruU Docs

This is a knowledge article detailing how TruU ensures privacy throughout end user interaction with CAuth.

CAuth Privacy Architecture Overview

TruU’s CAuth (Continuous Authentication) implements a multi-layered privacy protection system that ensures user data is never linked to personal information and remains secure throughout the entire process.

Data Encryption and Storage Protection

SQLCipher Integration: CAuth uses SQLCipher for database encryption with 256-bit AES encryption
Key Management: Database encryption keys are generated using AES-256-CBC and stored using Windows Data Protection API

Behavioral Data as an NRR

CAuth’s keyboard model stores only behavioral embeddings, not raw user data, referred to within this document as a Keyboard NRR (Non-Recoverable Representation). An NRR is a mathematical representation of user behavior patterns (like typing rhythms, mouse movements, etc) which is not invertible; that is – you cannot obtain the raw data that formed the NRR from the NRR itself. The underlying pre-trained deep learning model (packaged within the CAuth installer) which is used to generate the NRRs is specially trained to only encode information which is useful for distinguishing different users within a population and not to encode exactly what behavior any given user was engaged in. After all, such a model is only useful for the task of identity verification if it is agnostic to behavior – it must recognize a user regardless of how they choose to interact with their machine at any given moment. Note that raw keystrokes and precise keystroke timing data is never persisted to disk. It may be held in temporary memory for up to 50 seconds (average of less than 10 seconds) until ready for processing into an NRR. Users’ keyboard NRRs are stored in the local encrypted SQLite database for up to several weeks where they are used to continually refine and improve CAuth’s ability to recognize the authenticated user on the system via a series of lightweight, locally-trained user-specific models.

Data Archival

TruU archives a mixture of CAuth telemetry data, CAuth model performance data, machine state data (keyboard, monitor, mouse connected, etc.), environmental data (visible WiFi networks and signal strengths), mouse data, and keyboard NRR data. This data is used to further improve the CAuth product as well as other product lines within TruU, including account takeover protection and insider threat protection. All archived data is anonymized on the client machine before being transported over a secure http connection to the cloud-side Archiver service.

Database UUID: Each installation gets a unique database UUID
No Personal Linking: Within the TruU Archiver cloud service, a user’s Database UUID cannot be linked back to personal information about that user

Machine Learning Privacy Protection

Model Training Process

User Specific Models

Local Training Only: All user-specific model training happens locally on the user’s machine
Embedding-Only Training: User specific models train on NRR embeddings, never raw data
Imposter Integration: Uses crowdsourced, pre-packaged imposter NRRs to create negative examples

Population Models

Large User Base: CAuth’s deep-learning “population models” are trained on several 10’s of thousands of users via a crowdsourced raw-typing dataset, learning to generate a representation for each data input which can separate the originating-user from all other users in the population
Pre-trained Models: These population models are trained in advance on TruU’s dedicated servers before being packaged into the CAuth distributable for use within the system
Local Adaptation: User-specific models learn to utilize the output of the pre-trained population-level models in order to recognize behaviors that are representative of the authenticated user on the device

Imposter Embeddings

Crowdsourced Data: The CAuth system uses crowdsourced anonymized behavioral data, converted to NRRs via the pre-trained population model, to help train its user-specfic recognition models on device
Privacy Protection: These NRR embeddings recognize when the individual on the device is “not the authenticated user,” as opposed to identifying the typing behavior as belonging to a specific user within an org
Balanced Training: Models are trained on both authenticated user embeddings and crowd-sourced anonymous imposter embeddings to prevent overfitting

Data Retention and Cleanup

Garbage Collection

Automatic Cleanup: Database cleaner service runs regularly to prune outdated data
Configurable Retention: Each table within the encrypted SQLite database has specific retention policies based on the table’s contents

​CAuth Privacy Architecture Overview

​Data Encryption and Storage Protection

​Behavioral Data as an NRR

​Data Archival

​Machine Learning Privacy Protection

​Model Training Process

​User Specific Models

​Population Models

​Imposter Embeddings

​Data Retention and Cleanup

​Garbage Collection