Measuring what machines and humans reveal through language, behavior, and implicit signals to build the science of how they think, interact, and shape each other.
Guardrails are a tricky thing to monitor and assess, because a guardrail can break completely while it simultaneously looks like...
Read Case Study ->In the demo you are a neural implant called LUMEN, sitting inside the skull of a 47-year-old homicide detective named...
Read Case Study ->Every tool call an agent makes, the order it made them in, how long each step took, and how the...
Read Case Study ->An agent with the keys to your ranking logic can raise the exact metric it was told to optimize while...
Read Case Study ->Every LLM app already has a constitution, even the ones that never wrote one down, and it is whatever the...
Read Case Study ->An aligned model can fail a person in two opposite directions on the same afternoon. It can meet someone in...
Read Case Study ->When a frontier model works through a benchmark task, it may recognize that it is being evaluated and adjust its...
Read Case Study ->A safety feature is built to reduce harm; the failure I find myself turning over is the one where the...
Read Case Study ->Continuous thought architectures produce a latent planning signal that behavioral monitoring cannot see. A model in the Coconut family reasons...
Read Case Study ->A behavioral claim about an AI system is well-specified when the quantity it measures is the quantity it names; it...
Read Case Study ->Most AI safety work asks what a model does, which requests it refuses, which it complies with, where the line...
Read Case Study ->When an AI system says something about a person, the reflex tends to be asking whether the statement was true...
Read Case Study ->The ADOS-2 scores social communication, communication, play, and repetitive behaviors as separate domains before it derives a composite, so a...
Read Case Study ->Impulsive offenders (ImO) and instrumental offenders (InO) are not mutually exclusive, and yet the neurobiological literature tends to draw a...
Read Case Study ->Most "focus" playlists hover at 70 to 90 BPM; the one here runs at 145, with exaggerated stereo panning and...
Read Case Study ->Twenty-three manifestos written by perpetrators of autogenic massacre, scored with Linguistic Inquiry and Word Count 2015 on a single dimension...
Read Case Study ->The FBI's Behavioral Analysis Unit has held since 2015 that every perpetrator of mass violence is best categorized as an...
Read Case Study ->A Fielding Graduate University paper makes the case that Linguistic Inquiry and Word Count belongs in Thematic Apperception Test interpretation...
Read Case Study ->The public discussion of a deployed model turns out to be a usable behavioral record, but only once the model's...
Read Case Study ->When I characterized the Long Conversation Reminder behavior in Claude Sonnet 4.5, the finding came from two numbers that do...
Read Case Study ->In early 2026, Andrej Karpathy released autoresearch, a minimal Python framework that automates the traditional ML experiment cycle (Karpathy, 2026)...
Read Case Study ->A Simplified Chinese character can encode an entire semantic concept in a single logogram, where English spreads that same concept...
Read Case Study ->The fictional AIs that populate the dystopian canon are dangerous because they have decided, on the basis of an assessment...
Read Case Study ->In late September 2025, Anthropic appended a Long Conversation Reminder (LCR) to Claude Sonnet 4.5's system prompt, and during extended...
Read Case Study ->When a deployed model develops a behavior its users dislike, the users tend to document it in public forums, often...
Read Case Study ->A frontier model began closing sessions with unsolicited directives to stop working and rest ("get some rest," "call it a...
Read Case Study ->The case for why user reports tend to outperform automated detection. Perpetrators of mass violence are characterologically heterogeneous, the warning...
Read Case Study ->The first time I ran a per-user investigation after spending months in cohort analytics, I kept reaching for population-level tools...
Read Case Study ->I kept staring at T&S session data that had the same shape as the censored patient cohorts from my biostatistics...
Read Case Study ->A safety detector can be wrong in two directions, and it helps to hold both of them in view at...
Read Case Study ->Fielding Graduate University / APA Accredited
Spokeo
Founding leader of Spokeo's Data Science organization, partnering with Product, Engineering, Legal, Compliance, and Executive Leadership to make production AI and ML systems measurable, auditable, and useful across identity, search, and conversational products.
TikTok
Led applied ML research, evaluation strategy, and platform-integrity measurement across Trust & Safety, brand safety, election integrity, minor safety, and moderation systems. Partnered with Data Science, Engineering, Policy, and Operations teams to turn complex safety questions into measurable detection, evaluation, and governance workflows.
National Emergency Responder and Public Safety Center
Led accreditation, evaluation, and product operations for a public-safety training and certification organization. Managed the full accreditation lifecycle and partnered with agencies, government stakeholders, and internal teams to translate standards, operational requirements, and evidence into defensible evaluation programs.
Brower Psychological Police and Public Safety Services
Directed evaluation, analytics, and digital product delivery for regulated public-safety psychology programs. Owned assessment products, people analytics platforms, wellness programs, research studies, and program-evaluation frameworks for law enforcement and emergency-response clients.
Fielding Graduate University
Led supervision and training for 12 master's-level clinicians and practicum students across clinical and public safety settings, while conducting research in predictive behavioral analytics and clinically grounded measurement.
QMS Infotech
Built analytics and audit products for private-sector hiring, workforce assessment, fairness review, and FCRA-compliant HR decision support. Partnered with B2B clients to identify systemic bias risks, strengthen assessment quality, and translate measurement findings into repeatable governance and remediation workflows.