Behavioural health data, prepared for controlled use

Privacy-first behavioural health data for research, analytics, and AI development.

DJO DataLabs prepares de-identified and synthetic clinical datasets from real-world psychology-supervised care, with structured privacy review, governance, and utility validation before release.

Request the Tier A Dataset Card Request Evaluation Access

Privacy first Structured controls for direct and indirect identifiers.

Utility aware Datasets shaped for analysis, modelling, and product testing.

Governed release Review steps before data leaves the protected environment.

Currently available

Tier A De-identified Longitudinal Behavioural Health Dataset.

Tier A is our first available de-identified real-world behavioural health dataset, built from psychology-supervised care delivered in Canada. It contains structured longitudinal symptom/functioning ratings, treatment/session metadata, and constructed episode-level outputs designed for outcome analysis, utilization-response modelling, cohort analysis, and controlled AI product evaluation.

The release is positioned as a behavioural health outcomes and utilization dataset, not a claims, pharmacy, laboratory, or full EHR export. Raw narrative notes, psychological reports, identifiable documents, qEEG files, and unstructured free text are excluded from the current commercial package.

Request the Tier A Dataset Card Request Evaluation Access

112,877 Repeated structured clinical rating measurements

8,234 Clients represented in treatment/session metadata

9,993 Constructed treatment episodes

10 Primary clinical rating domains with confirmed directionality

Coverage

Plausible clinical/source-system coverage runs from June 18, 2002 to May 6, 2026. Structured symptom and rating measurements run from June 18, 2002 to November 3, 2025.

Outcome layer

The buyer-facing release includes 8,693 strict QA outcome rows after excluding sparse non-primary OCB and Other domains from outcome tables.

Privacy posture

The package excludes direct identifiers, exact dates, raw free text, start and stop times, provider names, reports, source files, and raw file paths.

Upcoming

Broader clinical data lake under development

Beyond Tier A, DJO DataLabs is developing a broader clinical data lake that consolidates de-identified, real-world behavioural health data across multiple service lines. Future layers may include de-identified demographics, referral sources, presenting concerns, DSM diagnostic impressions, standardized assessment results, symptom and functioning measures, treatment episodes, session-level service utilization, report metadata, and longitudinal outcome measures.

Where available and appropriate for release, the broader lake may also support datasets derived from psychoeducational, psychological, MVA, WSIB, and related assessment workflows, including psychometric test outputs, clinical interview variables, qEEG/neurofeedback-related data, treatment recommendations, functional impairment indicators, and diagnosis-linked assessment profiles.

The objective is to convert historically fragmented clinical information into structured, governed data assets suitable for internal quality improvement, cohort analysis, research collaboration, AI-readiness, and controlled commercial evaluation.

PHIPA-aligned planning PIPEDA-aware controls Canadian de-identification expectations

What we do

We turn real behavioural health data into governed de-identified and synthetic datasets.

DJO DataLabs helps convert historically fragmented behavioural health records into structured, privacy-reviewed data assets. We work with real-world clinical datasets, remove or generalize identifying signals, document the release structure, and prepare governed packages for research, analytics, model development, and commercial evaluation.

De-identified real-world datasets

Structured clinical data prepared with direct identifiers excluded and release documentation attached.

Synthetic data derivatives

Fit-for-purpose synthetic datasets designed to preserve analytic utility while reducing exposure risk.

Governed evaluation packages

Dataset cards, data dictionaries, sample packages, and controlled access pathways for qualified partners.

Who it is for

Built for qualified teams evaluating behavioural health data products.

Mental health AI teams

Evaluate outcome-oriented workflows, prototype model concepts, and test behavioural health analytics before pursuing deeper data partnerships.

Clinical research groups

Study symptom trajectories, treatment response, utilization patterns, and episode-level outcomes in real-world psychotherapy and assessment-linked care.

Digital health and therapeutics companies

Explore longitudinal behavioural health patterns that can support product validation, evidence planning, and research collaboration.

Commercial analytics teams

Build dashboards, forecasts, cohort models, and market analyses using realistic care-pattern data with governed release controls.

Governance and release controls

Prepared for controlled buyer review.

DJO DataLabs is being developed around a privacy-first release model. Commercial or research packages are prepared with direct identifiers excluded, high-risk fields removed or generalized, and release documentation provided to support buyer review.

Direct identifiers excluded before release
No raw clinical notes, reports, source files, provider names, raw file paths, or unreviewed free text in commercial packages
Exact dates, start/stop times, and high-risk temporal fields removed, shifted, or generalized where appropriate
Dataset card and data dictionary available for controlled evaluation
NDA, paid evaluation access, and partner-specific review pathways available where appropriate
Governance approach designed with PHIPA, PIPEDA, and Canadian de-identification expectations in mind