AI usage and cost tracking

Purpose

JobSim records AI usage and estimated cost per event so org admins and super admins can audit:

token usage by organization and user
usage by model, feature area, and request type
estimated cost with org-specific pricing rules

Core implementation lives in:

src/core/ai_usage/recorder.py
src/memory/ai_usage_repository.py
src/apis/ai_usage.py

Source of truth

ai_usage_events is the canonical collection for cost reporting and usage dashboards.
organization_ai_pricing is the canonical collection for org pricing versions.
token_usage is a legacy collection kept for backward compatibility and migration/backfill flows; cost dashboards should use ai_usage_events.

MongoDB collections

ai_usage_events — atomic usage records with org/user attribution and estimated costs
organization_ai_pricing — org pricing versions (is_active, effective_from, effective_to)

Typical ai_usage_events fields:

identity/attribution: organization_id, user_id, session_id, created_at
taxonomy: feature_area, request_type, provider, model_name
usage counters: input_tokens, output_tokens, total_tokens, embedding_tokens, mem0_units, quick_help_minutes
estimated costs: estimated_input_cost, estimated_output_cost, estimated_embedding_cost, estimated_mem0_cost, estimated_quick_help_cost, estimated_total_cost
pricing linkage: currency, pricing_id, pricing_source
debugging context: metadata

Default pricing baseline

When an organization has no active pricing record, the recorder falls back to:

llm_input_price_per_million = 0.15
llm_output_price_per_million = 0.60
embedding_price_per_million = 0.02
mem0_price_per_unit = 0.0
quick_help_price_per_minute = 0.31
currency = USD

These defaults match current primary models:

LLM: gpt-4o-mini
Embeddings: text-embedding-3-small
Quick Help voice runtime: gpt-4o-realtime-preview (with LiveKit infra overhead reflected in per-minute baseline)

Pricing precedence

Active org pricing in organization_ai_pricing for the event timestamp.
Default baseline above when no active org pricing exists.

Event-level estimated cost is persisted at write time for historical stability. When POST /pricing creates a new pricing version:

current active versions for the org are deactivated with effective_to set to the new version start time
one new active version is inserted
all future events use that new active version
past events keep their already-persisted estimated costs and pricing linkage

API endpoints

All routes are under /api/v1/ai-usage:

POST /pricing — create new active org pricing version
GET /pricing — get active org pricing
GET /summary — org usage/cost summary
GET /users — org per-user breakdown
GET /trends — org daily trend
GET /platform-summary — super-admin platform summary
GET /platform-trends — super-admin platform daily trend

Key query params:

org-scoped endpoints: days (default 30, min 1, max 365)
/users: limit (default 50, min 1, max 200)
super admin org-scoped queries: organization_id is required

RBAC behavior:

super admin can access cross-org and any org-scoped usage with organization_id
org admin can access only their own org usage/pricing
non-admin roles are rejected

Request-type coverage

Usage events include request types for major flows, including:

llm_chat, tool_calling_llm, message_triage
task_evaluation
score_explanation, analytics_summary, analytics_psychometric, analytics_professionalism_rating
background_agent_chatter, background_agent_dm, scheduler_message_generation
embedding* request types
memory_store, memory_retrieval

Current high-value flows instrumented in code include:

end-user and agent chat completions
task evaluation and scoring explanation
analytics psychometric/professionalism evaluations
dynamic routing / triage / scheduler generation
background chatter and background agent DM generation
embedding generation for org knowledge
mem0 store/retrieval accounting

Mem0 accounting note

Mem0 usage is currently tracked as mem0_units for memory store/retrieval operations, with provider mem0. Mem0 events use the configured runtime LLM model label (typically gpt-4o-mini) for attribution consistency. Keep mem0_price_per_unit separate from token pricing to avoid accidental double counting.

Cost-control settings tied to usage

Two org-level simulation settings directly influence AI cost pressure and are persisted in simulation_settings:

simulation_policy.ambient.chatter_hourly_token_budget
- soft/hard guardrail for background chatter token spend
- validated as integer 2000..100000
psychometric_recalc_interval_messages
- controls how frequently expensive psychometric LLM recalculation runs
- validated as integer 1..50

​Purpose

​Source of truth

​MongoDB collections

​Default pricing baseline

​Pricing precedence

​API endpoints

​Request-type coverage

​Mem0 accounting note

​Cost-control settings tied to usage