Dr. Heather Leffew

Education

Doctor of Philosophy (PhD), Psychology

Fielding Graduate University / APA Accredited

Specialization

Quantitative Predictive Linguistics

Dissertation

Instrumental and Affective Mass Murder: Establishing a Predictive Typology with Computer-Mediated Linguistic Analysis.

Conference Presentation

Implicit Power Drives in the Manifestos Preceding Autogenic Massacres.

Organizational Outcomes

Professional Experience

Director of Data Science

Spokeo

Sep 2025 - Present

Founding leader of Spokeo's Data Science organization, partnering with Product, Engineering, Legal, Compliance, and Executive Leadership to make production AI and ML systems measurable, auditable, and useful across identity, search, and conversational products.

Built and deployed evaluation frameworks for entity resolution, social identity, ranking, and conversational AI systems, including weak supervision, LLM-as-a-Judge, multi-model adjudication, and human-in-the-loop evaluation, grounded in construct validity and measurement theory.
Developed replay-based evaluation methodologies, outcome-based measurement frameworks, and telemetry infrastructure to identify model failure modes, ranking opportunities, compliance risks, user-friction patterns, recommendation opportunities, and product quality improvements.
Designed custom agentic harnesses for AI-assisted workflows, embedding auditable review processes, evaluation evidence, and quality controls directly into engineering workflows.
Implemented review standards focused on measurement integrity, construct validity, and decision quality across applied ML and AI systems.
Led compliance evaluation initiatives across AI systems and monitoring workflows, supporting zero confirmed compliance failures across reviewed production samples and ISO/IEC 42001 readiness through AI governance, auditability, model oversight, and Responsible AI operating standards.
Built automated source-evaluation frameworks measuring coverage, quality, signal contribution, and ROI, guiding multi-million-dollar data acquisition and renewal decisions.

Principal Data Scientist & Applied Research Engineer

TikTok

Jul 2022 - Aug 2025

Led applied ML research, evaluation strategy, and platform-integrity measurement across Trust & Safety, brand safety, election integrity, minor safety, and moderation systems. Partnered with Data Science, Engineering, Policy, and Operations teams to turn complex safety questions into measurable detection, evaluation, and governance workflows.

Built ML measurement and detection systems across NLP, computer vision, behavioral analytics, and distributed data processing to improve safety evaluation at platform scale.
Developed enforcement measurement frameworks that helped teams distinguish model precision from over-enforcement risk, supporting higher-confidence actioning and review workflows.
Built election-integrity analytics that unified search, content, engagement, and moderation data into a shared evidence layer for cross-functional decision-making.
Enhanced minor-safety analysis with behavioral network discovery, helping teams understand how risk emerges across accounts, surfaces, and time.
Built distributed PySpark pipelines over nested moderation events to support user-level safety analysis, review prioritization, and forensic reporting with necessity, proportionality, accountability, and redaction guardrails.
Applied survival analysis, anomaly detection, sentiment drift, topic modeling, and behavioral trajectory analysis to measure safety degradation and user-experience risk.
Established external IRB methodology for research partnerships and mentored researchers on statistics, measurement, methodology, and research design.

Head of Accreditation

National Emergency Responder and Public Safety Center

Nov 2020 - Jul 2022

Led accreditation, evaluation, and product operations for a public-safety training and certification organization. Managed the full accreditation lifecycle and partnered with agencies, government stakeholders, and internal teams to translate standards, operational requirements, and evidence into defensible evaluation programs.

Built standards-aligned assessment and review frameworks for emergency responder training, certification, renewal, and site-visit processes.
Scaled accreditation programs to support 1,000+ active agencies through structured evaluation workflows, documentation standards, and repeatable decision criteria.
Leadership and P&L ownership across product development, market analysis, agency evaluation, and client delivery.
Expanded B2B client acquisition and recurring revenue by authoring RFP responses that translated agency requirements into practical product and evaluation scope.
Guided leadership decisions in regulated public-safety environments by presenting risk analyses, performance metrics, program-quality evidence, and accreditation findings to government officials and oversight bodies.

Director of Evaluations and Analytics

Brower Psychological Police and Public Safety Services

Jun 2019 - Jul 2022

Directed evaluation, analytics, and digital product delivery for regulated public-safety psychology programs. Owned assessment products, people analytics platforms, wellness programs, research studies, and program-evaluation frameworks for law enforcement and emergency-response clients.

Built evaluation frameworks for psychological assessment, public-safety interventions, wellness programs, and organizational risk measurement.
Led three product teams totaling 18 members across assessment delivery, analytics, digital transformation, and client-facing program operations.
Increased processing speed 5x by replacing legacy workflows with cloud-based assessment products, digitized operations, and standardized analytics processes.
Secured $500K+ in grant capital by building data-driven roadmaps that demonstrated measurable DEI outcomes grounded in behavioral science.
Forecasted mental health service needs using predictive analytics pipelines and evidence-based measurement.
Expanded a multi-state client portfolio by aligning psychological services with regulatory expectations, agency procurement needs, and required RFP standards.
Applied implicit behavioral measurement to pre-employment fitness evaluations, occupational risk assessment, and high-stakes public-safety decision support.

Doctoral Researcher & Supervisor

Fielding Graduate University

May 2015 - Jul 2019

Led supervision and training for 12 master's-level clinicians and practicum students across clinical and public safety settings, while conducting research in predictive behavioral analytics and clinically grounded measurement.

Diagnostic Psychological Evaluations: Cedar Springs Psychiatric Hospital.
Forensic, Occupational, and Clinical Neuropsychological Assessment: Rocky Mountain Behavioral Health.
Critical Incident Response, Fitness for Duty Evaluations, Psychotherapy: Brower Psychological Police and Public Safety Psychology.
Assistant Disaster Coordinator: Comprehensive Clinical Services (Aurora Mental Health Center).
Teaching Assistant: Theories of Psychotherapy.
Research Assistant: Clinically Predictive Linguistic Analysis of Thematic Apperception Test Narratives.

Consulting Staff Data Scientist

QMS Infotech

Mar 2008 - Oct 2019

Built analytics and audit products for private-sector hiring, workforce assessment, fairness review, and FCRA-compliant HR decision support. Partnered with B2B clients to identify systemic bias risks, strengthen assessment quality, and translate measurement findings into repeatable governance and remediation workflows.

Designed analytics products to evaluate hiring practices, workforce selection systems, adverse impact, and diversity outcomes.
Established psychometric and statistical standards for talent-assessment audits, including construct validity, fairness measurement, and defensible decision criteria.
Implemented bias-mitigating policies at scale that achieved full FCRA compliance for enterprise HR partners.
Generated recommendations that improved diversity outcomes through repeatable statistical modeling, audit workflows, and evidence-based governance practices.
Built the data-science foundation for later Responsible AI work by connecting model outputs, regulated decision systems, fairness risk, and human assessment validity.

Technical & Leadership Capabilities

Leadership & AI Strategy

Technical LeadershipAI StrategyML LeadershipProduct StrategyCross-Functional LeadershipMentoring & CoachingOKR TaxonomyAI GovernanceResponsible AIRegulatory ComplianceBias MitigationEthical AI

Machine Learning & Modeling

Neuro-Symbolic AIGraph MLKnowledge GraphsEntity ResolutionWeak SupervisionReinforcement LearningAgentic SystemsLLMsNLPSemantic ClusteringPredictive ModelingXGBoostCausal InferenceAblation Studies

MLOps & Infrastructure

End-to-End Pipeline ArchitectureModel Lifecycle MgmtDistributed ComputingAWS SageMaker / EMR / EC2 / GlueDatabricksDelta LakeApache SparkAirflowDocker & KubernetesCI/CD

Research & Measurement

Experimental DesignA/B TestingPsychometricsSource Independence AnalysisBehavioral TypologiesIncrementality TestingMedia Mix ModelingLift Studies

Technical Stack & Tools

PythonPandas / NumPyScikit-learnPyTorchTensorFlowHugging FacePySparkSQL / Hive / ScalaKafkaTableau / Power BI

Biography

Interactive Simulators & Projects

A Recipe for Shipping AI Guardrails (without experimenting on your users)

DEAD SIGNAL, An AI Evals Harness for Generative Game Dialogue

Grading an Agent as a User Experience

Letting an Agent Improve Your System, Gated by Evaluation

The Constitution Your LLM App Already Has

A Safety Threshold That Moves With the Context

Reading a Model's Mind in Its Own Words

The Guardrail Paradox, When the Safety Feature Becomes the Problem

The Latent Projective Bleed

The Misspecification Problem, What a Behavioral Claim About an AI System Has to Name

The Somatic Deficit, Moral Language Without Moral Ontology

The Warrant Question, When an Accurate Output Is Still a Violation

An ADOS-2-Aligned Multimodal Architecture for Clinical Autism Assessment

Neurobiological Differentiation of Violent Offender Types

Three Mechanisms in an ADHD Focus Playlist

Implicit Power Drives in the Manifestos Preceding Autogenic Massacres

Instrumental and Affective Mass Murder, Establishing a Predictive Typology with Computer-Mediated Linguistic Analysis

LIWC and the TAT.

Building a Reddit-Corpus Pipeline for LLM Behavioral Coding

Counting a Behavior Is Not the Same as Characterizing It

Giving an LLM an Overnight Research Loop (and keeping it from breaking everything)

The Tokenization Trap, Why Logographic Languages Don't Save Agentic Context

Claude's "Character Tic" Is Actually the Plot of Every Dystopian Piece of AI Fiction Ever Created

Pathologizing Without Warrant

Reading Misalignment Off the Public Record

The Bedtime Directive

Assessing Risk for Mass Violence From Platform Behaviors

Per-User Investigation as Trajectory, Not Snapshot

Time-to-Event Analysis for Platform Integrity

Why a Safety Detector Tends to Need Corroboration