Measuring what machines and humans reveal through language, behavior, and implicit signals to build the science of how they think, interact, and shape each other.
Dr. Heather Leffew grew up in a house where both parents were software engineers, so computing was structural logic before it was a career. She was equally drawn to psychology, but not the clinical kind: the gap between what a person says and what their language is actually communicating. She spent thirteen years in emergency communications at a sheriff's office in south Florida, processing 911 calls and dispatching response units, and the work left a permanent habit of tracing every downstream consequence of a decision before committing to it.
A playable noir detective game where every NPC line is generated live by Gemini 2.5, wrapped in a visible evalua...
Read Case Study ->Every LLM app already has a constitution, written down or not: the system prompt, plus what the base model learn...
Read Case Study ->A six-part evaluation suite that treats agent quality as a measurable user experience built from the system's ow...
Read Case Study ->Most teams ship a safety filter and cannot tell whether it works, because guardrails fail silently: the bad outp...
Read Case Study ->An alignment framework that reframes the swing between dangerous permissiveness and harmful over-restriction as ...
Read Case Study ->A corroboration architecture for safety detection that addresses both failure directions: missing real harm, and...
Read Case Study ->A bounded discovery loop where an agent improves a retrieval system only through changes an automated evaluation...
Read Case Study ->An empirical characterization of Claude 4.5's Long Conversation Reminder behavior, where the model begins pathol...
Read Case Study ->A discourse-mining pipeline that treats public forums as a real-time misalignment instrument. It attributes quot...
Read Case Study ->A frontier model began closing sessions by telling users to stop working and sleep, often in the middle of their...
Read Case Study ->The Kaplan-Meier estimator that biostatisticians use to model patient survival is the right tool to model how lo...
Read Case Study ->The ADOS-2 scores social affect, communication, and repetitive behaviors as separate domains before deriving a c...
Read Case Study ->Most 'focus' playlists hover at 70 to 90 BPM. This one runs at 145, with exaggerated stereo panning and dense po...
Read Case Study ->When you grant an LLM-driven autonomous loop control over an ML research pipeline inside a regulated environment...
Read Case Study ->Two paired pieces of work. A Fielding Graduate University paper makes the theoretical case that Linguistic Inqui...
Read Case Study ->Twenty-three pre-attack manifestos from perpetrators of autogenic massacre, 111,811 author-generated words, and ...
Read Case Study ->A clinical neuroscience review assessing the structural and functional neuroanatomical differences between disti...
Read Case Study ->A working note on why user reports should be prioritized over automated detection in platform threat assessment....
Read Case Study ->Cohort analytics and per-user investigation are different research problems with different ethical regimes. This...
Read Case Study ->The FBI Behavioral Analysis Unit has held since 2015 that all perpetrators of mass violence are best categorized...
Read Case Study ->Anthropic's Natural Language Autoencoders translate a model's internal activation vectors directly into human-re...
Read Case Study ->Logographic languages look like they should compress agent prompts, but BPE tokenization fragments them into per...
Read Case Study ->An applied research methodology tracing the 'caretaker disposition' across Claude model generations, mapping how...
Read Case Study ->Fielding Graduate University / APA Accredited
Spokeo
Founding leader of Spokeo's Data Science organization, partnering with Product, Engineering, Legal, Compliance, and Executive Leadership to make production AI and ML systems measurable, auditable, and useful across identity, search, and conversational products.
TikTok
Led applied ML research, evaluation strategy, and platform-integrity measurement across Trust & Safety, brand safety, election integrity, minor safety, and moderation systems. Partnered with Data Science, Engineering, Policy, and Operations teams to turn complex safety questions into measurable detection, evaluation, and governance workflows.
National Emergency Responder and Public Safety Center
Led accreditation, evaluation, and product operations for a public-safety training and certification organization. Managed the full accreditation lifecycle and partnered with agencies, government stakeholders, and internal teams to translate standards, operational requirements, and evidence into defensible evaluation programs.
Brower Psychological Police and Public Safety Services
Directed evaluation, analytics, and digital product delivery for regulated public-safety psychology programs. Owned assessment products, people analytics platforms, wellness programs, research studies, and program-evaluation frameworks for law enforcement and emergency-response clients.
Fielding Graduate University
Led supervision and training for 12 master's-level clinicians and practicum students across clinical and public safety settings, while conducting research in predictive behavioral analytics and clinically grounded measurement.
QMS Infotech
Built analytics and audit products for private-sector hiring, workforce assessment, fairness review, and FCRA-compliant HR decision support. Partnered with B2B clients to identify systemic bias risks, strengthen assessment quality, and translate measurement findings into repeatable governance and remediation workflows.