CAuth Privacy Architecture Overview
TruU’s CAuth (Continuous Authentication) implements a multi-layered privacy protection system that ensures user data is never linked to personal information and remains secure throughout the entire process.Data Encryption and Storage Protection
- SQLCipher Integration: CAuth uses SQLCipher for database encryption with 256-bit AES encryption
- Key Management: Database encryption keys are generated using AES-256-CBC and stored using Windows Data Protection API
Behavioral Data as an NRR
CAuth’s keyboard model stores only behavioral embeddings, not raw user data, referred to within this document as a Keyboard NRR (Non-Recoverable Representation). An NRR is a mathematical representation of user behavior patterns (like typing rhythms, mouse movements, etc) which is not invertible; that is – you cannot obtain the raw data that formed the NRR from the NRR itself. The underlying pre-trained deep learning model (packaged within the CAuth installer) which is used to generate the NRRs is specially trained to only encode information which is useful for distinguishing different users within a population and not to encode exactly what behavior any given user was engaged in. After all, such a model is only useful for the task of identity verification if it is agnostic to behavior – it must recognize a user regardless of how they choose to interact with their machine at any given moment. Note that raw keystrokes and precise keystroke timing data is never persisted to disk. It may be held in temporary memory for up to 50 seconds (average of less than 10 seconds) until ready for processing into an NRR. Users’ keyboard NRRs are stored in the local encrypted SQLite database for up to several weeks where they are used to continually refine and improve CAuth’s ability to recognize the authenticated user on the system via a series of lightweight, locally-trained user-specific models.Data Archival
TruU archives a mixture of CAuth telemetry data, CAuth model performance data, machine state data (keyboard, monitor, mouse connected, etc.), environmental data (visible WiFi networks and signal strengths), mouse data, and keyboard NRR data. This data is used to further improve the CAuth product as well as other product lines within TruU, including account takeover protection and insider threat protection. All archived data is anonymized on the client machine before being transported over a secure http connection to the cloud-side Archiver service.- Database UUID: Each installation gets a unique database UUID
- No Personal Linking: Within the TruU Archiver cloud service, a user’s Database UUID cannot be linked back to personal information about that user
Machine Learning Privacy Protection
Model Training Process
User Specific Models
- Local Training Only: All user-specific model training happens locally on the user’s machine
- Embedding-Only Training: User specific models train on NRR embeddings, never raw data
- Imposter Integration: Uses crowdsourced, pre-packaged imposter NRRs to create negative examples
Population Models
- Large User Base: CAuth’s deep-learning “population models” are trained on several 10’s of thousands of users via a crowdsourced raw-typing dataset, learning to generate a representation for each data input which can separate the originating-user from all other users in the population
- Pre-trained Models: These population models are trained in advance on TruU’s dedicated servers before being packaged into the CAuth distributable for use within the system
- Local Adaptation: User-specific models learn to utilize the output of the pre-trained population-level models in order to recognize behaviors that are representative of the authenticated user on the device
Imposter Embeddings
- Crowdsourced Data: The CAuth system uses crowdsourced anonymized behavioral data, converted to NRRs via the pre-trained population model, to help train its user-specfic recognition models on device
- Privacy Protection: These NRR embeddings recognize when the individual on the device is “not the authenticated user,” as opposed to identifying the typing behavior as belonging to a specific user within an org
- Balanced Training: Models are trained on both authenticated user embeddings and crowd-sourced anonymous imposter embeddings to prevent overfitting
Data Retention and Cleanup
Garbage Collection
- Automatic Cleanup: Database cleaner service runs regularly to prune outdated data
- Configurable Retention: Each table within the encrypted SQLite database has specific retention policies based on the table’s contents

