Purpose
JobSim records AI usage and estimated cost per event so org admins and super admins can audit:- token usage by organization and user
- usage by model, feature area, and request type
- estimated cost with org-specific pricing rules
src/core/ai_usage/recorder.pysrc/memory/ai_usage_repository.pysrc/apis/ai_usage.py
Source of truth
ai_usage_eventsis the canonical collection for cost reporting and usage dashboards.organization_ai_pricingis the canonical collection for org pricing versions.token_usageis a legacy collection kept for backward compatibility and migration/backfill flows; cost dashboards should useai_usage_events.
MongoDB collections
ai_usage_events— atomic usage records with org/user attribution and estimated costsorganization_ai_pricing— org pricing versions (is_active,effective_from,effective_to)
ai_usage_events fields:
- identity/attribution:
organization_id,user_id,session_id,created_at - taxonomy:
feature_area,request_type,provider,model_name - usage counters:
input_tokens,output_tokens,total_tokens,embedding_tokens,mem0_units,quick_help_minutes - estimated costs:
estimated_input_cost,estimated_output_cost,estimated_embedding_cost,estimated_mem0_cost,estimated_quick_help_cost,estimated_total_cost - pricing linkage:
currency,pricing_id,pricing_source - debugging context:
metadata
Default pricing baseline
When an organization has no active pricing record, the recorder falls back to:llm_input_price_per_million = 0.15llm_output_price_per_million = 0.60embedding_price_per_million = 0.02mem0_price_per_unit = 0.0quick_help_price_per_minute = 0.31currency = USD
- LLM:
gpt-4o-mini - Embeddings:
text-embedding-3-small - Quick Help voice runtime:
gpt-4o-realtime-preview(with LiveKit infra overhead reflected in per-minute baseline)
Pricing precedence
- Active org pricing in
organization_ai_pricingfor the event timestamp. - Default baseline above when no active org pricing exists.
POST /pricing creates a new pricing version:
- current active versions for the org are deactivated with
effective_toset to the new version start time - one new active version is inserted
- all future events use that new active version
- past events keep their already-persisted estimated costs and pricing linkage
API endpoints
All routes are under/api/v1/ai-usage:
POST /pricing— create new active org pricing versionGET /pricing— get active org pricingGET /summary— org usage/cost summaryGET /users— org per-user breakdownGET /trends— org daily trendGET /platform-summary— super-admin platform summaryGET /platform-trends— super-admin platform daily trend
- org-scoped endpoints:
days(default 30, min 1, max 365) /users:limit(default 50, min 1, max 200)- super admin org-scoped queries:
organization_idis required
- super admin can access cross-org and any org-scoped usage with
organization_id - org admin can access only their own org usage/pricing
- non-admin roles are rejected
Request-type coverage
Usage events include request types for major flows, including:llm_chat,tool_calling_llm,message_triagetask_evaluationscore_explanation,analytics_summary,analytics_psychometric,analytics_professionalism_ratingbackground_agent_chatter,background_agent_dm,scheduler_message_generationembedding*request typesmemory_store,memory_retrieval
- end-user and agent chat completions
- task evaluation and scoring explanation
- analytics psychometric/professionalism evaluations
- dynamic routing / triage / scheduler generation
- background chatter and background agent DM generation
- embedding generation for org knowledge
- mem0 store/retrieval accounting
Mem0 accounting note
Mem0 usage is currently tracked asmem0_units for memory store/retrieval operations, with provider mem0.
Mem0 events use the configured runtime LLM model label (typically gpt-4o-mini) for attribution consistency.
Keep mem0_price_per_unit separate from token pricing to avoid accidental double counting.
Cost-control settings tied to usage
Two org-level simulation settings directly influence AI cost pressure and are persisted insimulation_settings:
simulation_policy.ambient.chatter_hourly_token_budget- soft/hard guardrail for background chatter token spend
- validated as integer
2000..100000
psychometric_recalc_interval_messages- controls how frequently expensive psychometric LLM recalculation runs
- validated as integer
1..50