Skip to main content

How Is Keyboard Data Used? 

The collection of idiosyncrasies inherent to the typing behavior of each individual can provide a sort of fingerprint that allows any particular user to be recognized. CAuth observes typing behavior (timing and rhythm) in order to make a simple binary prediction – is the authenticated user typing (“low risk”) or is it someone else at the keyboard (“high risk”)? 

Keystroke Capture and Processing Flow 

  1. Key Capture: Press time, release time, and key code are recorded temporarily into memory. 
  2. Aggregation: TruU’s custom aggregation algorithm dynamically evaluates the information content of keystrokes currently cached in memory, flushing the cache when certain conditions are met 
  3. Keyboard Sample Creation: Novel features are extracted from the flushed keystroke set and passed to a state-of-the-art pre-trained deep neural-net, transforming the data into a Non Recoverable Representation (NRR) that encodes user typing behavior. 
  4. Sample Scoring: User specific models are trained on the generated NRR’s, once trained these models will reliably map any newly-gathered NRR into a numerical score - the likelihood that the NRR was generated by the authenticated user. This score is passed along to be combined with other risk signals(mouse, environment etc) in order to determine the overall risk score of the device. 
  5. Persistence: The NRR and associated metadata are persisted locally in a secure database for future on-device model training and evaluation processes. 
  6. Archival: The NRR and its metadata are first anonymized then transmitted to the cloud (every 60 sec cycle) for use in improving CAuth’s predictive algorithm. 

Model Training 

Implicit to the keyboard model training process is the assumption that the authenticated user, and only the authenticated user, will use the machine during the initial “warmup” phase as CAuth learns the unique behavioral characteristics encoded within the generated NRR’s. The exact duration of this warmup period will depend heavily on how a given user types, but is typically less than 10,000 total keystrokes or 400 total complete sentences typed. For typical usage this amount can be achieved in less than a working day.  Beyond the “warmup” phase, CAuth will continue to refine and adapt its internal modelling of typing behavior. All internal models are periodically subject to error-rate benchmarking, a process that remains contained to the device and will aide downstream decision engines in their risk assessment calculations. As the total volume of typing data observed by CAuth increases, and in particular as different usage patterns are learned, the training and evaluation processes will result in a richer and more stable typing model. 

Persisted Data 

  • NRR - Non recoverable representation for the typing sample which encodes behavior which is intrinsic to the user. 
  • Timestamps - Time that the first key and last key within the typing sample were pressed. 
  • Length - The total number of keystrokes that contributed to generating the NRR. 
  • App - The app(s) that was(were) active while the keys were typed. 
  • CAuth data - Additional labels and/or values generated by CAuth which are not constructed from any of the typing data, for use in other aspects of the total CAuth system. 

Privacy and Security Safeguards 

Protecting user privacy and securing sensitive data are foundational principles in the design of CAuth’s keyboard risk signal. The system is engineered to ensure that no reconstructable or sensitive information about user input is ever stored or transmitted, while still enabling robust, continuous authentication. 
  • No Raw Keystroke Storage: CAuth never writes raw keystrokes, keycodes, or typed content to disk, nor are they ever transmitted to the cloud. All processing of keyboard input occurs locally and in-memory. 
  • Volatile Memory Only: Keystroke event metadata (such as key press/release times and event types) is held only briefly in volatile memory for aggregation and feature extraction. This data is flushed from memory within seconds—typically less than 10 seconds, and never more than 50 seconds. 
  • Non-Recoverable Representation (NRR): Before any data is persisted, typing samples are transformed into a Non-Recoverable Representation (NRR) using a deep neural network. This process ensures that the original keystrokes or typed content cannot be reconstructed from the stored data. The NRR encodes only behavioral patterns, not the actual text. 
  • Secure Database: All extracted features, risk scores, and event metadata are stored in an encrypted local database on the endpoint device, which is further protected by restricting read or write access to Admin users only. Encryption keys are device-specific ensuring that even if the database is copied, its contents remain inaccessible. 
  • Minimal Metadata: Only anonymized timing features and minimal metadata (such as the number of keystrokes, start/end times, and active application) are retained. No information that could reveal what was typed is ever persisted. 
  • Anonymization Before Cloud Upload: When NRRs and associated metadata are transmitted to the cloud (for model improvement or analytics), they are first anonymized such that any given NRR has no link back to the identity of the individual from whom it was constructed. No reconstructable or identifying information about the user’s input is ever sent off-device. 
  • Short Retention Windows: Data stored locally is retained only as long as necessary for risk assessment, model training and evaluation, with automatic purging after a default period (e.g., 14 days). 
  • Privacy by Design: The system is designed so that sensitive user data never leaves the device, and no reconstructable content is ever stored. 

Known Limitations 

  • Touchscreen keyboards: Currently touchscreen keyboards are not supported and CAuth behavior resulting from touchscreen typing may be unpredictable. 
  • Typing Speed: CAuth continuous keyboard behavioral modelling assumes that typing speed consistently remains above 30 words per minute. While the system may continue to perform well for slower typing, performance will degrade - typically resulting in CAuth defaulting to a neutral state wherein it takes no action. 
  • Typing Behavior: Erratic typing (consisting of short bursts of a few keys followed by longer pauses) can lead to similar results as does slow typing.