Purpose

JobSim records AI usage and estimated cost per event so org admins and super admins can audit:
  • token usage by organization and user
  • usage by model, feature area, and request type
  • estimated cost with org-specific pricing rules
Core implementation lives in:
  • src/core/ai_usage/recorder.py
  • src/memory/ai_usage_repository.py
  • src/apis/ai_usage.py

Source of truth

  • ai_usage_events is the canonical collection for cost reporting and usage dashboards.
  • organization_ai_pricing is the canonical collection for org pricing versions.
  • token_usage is a legacy collection kept for backward compatibility and migration/backfill flows; cost dashboards should use ai_usage_events.

MongoDB collections

  • ai_usage_events — atomic usage records with org/user attribution and estimated costs
  • organization_ai_pricing — org pricing versions (is_active, effective_from, effective_to)
Typical ai_usage_events fields:
  • identity/attribution: organization_id, user_id, session_id, created_at
  • taxonomy: feature_area, request_type, provider, model_name
  • usage counters: input_tokens, output_tokens, total_tokens, embedding_tokens, mem0_units, quick_help_minutes
  • estimated costs: estimated_input_cost, estimated_output_cost, estimated_embedding_cost, estimated_mem0_cost, estimated_quick_help_cost, estimated_total_cost
  • pricing linkage: currency, pricing_id, pricing_source
  • debugging context: metadata

Default pricing baseline

When an organization has no active pricing record, the recorder falls back to:
  • llm_input_price_per_million = 0.15
  • llm_output_price_per_million = 0.60
  • embedding_price_per_million = 0.02
  • mem0_price_per_unit = 0.0
  • quick_help_price_per_minute = 0.31
  • currency = USD
These defaults match current primary models:
  • LLM: gpt-4o-mini
  • Embeddings: text-embedding-3-small
  • Quick Help voice runtime: gpt-4o-realtime-preview (with LiveKit infra overhead reflected in per-minute baseline)

Pricing precedence

  1. Active org pricing in organization_ai_pricing for the event timestamp.
  2. Default baseline above when no active org pricing exists.
Event-level estimated cost is persisted at write time for historical stability. When POST /pricing creates a new pricing version:
  • current active versions for the org are deactivated with effective_to set to the new version start time
  • one new active version is inserted
  • all future events use that new active version
  • past events keep their already-persisted estimated costs and pricing linkage

API endpoints

All routes are under /api/v1/ai-usage:
  • POST /pricing — create new active org pricing version
  • GET /pricing — get active org pricing
  • GET /summary — org usage/cost summary
  • GET /users — org per-user breakdown
  • GET /trends — org daily trend
  • GET /platform-summary — super-admin platform summary
  • GET /platform-trends — super-admin platform daily trend
Key query params:
  • org-scoped endpoints: days (default 30, min 1, max 365)
  • /users: limit (default 50, min 1, max 200)
  • super admin org-scoped queries: organization_id is required
RBAC behavior:
  • super admin can access cross-org and any org-scoped usage with organization_id
  • org admin can access only their own org usage/pricing
  • non-admin roles are rejected

Request-type coverage

Usage events include request types for major flows, including:
  • llm_chat, tool_calling_llm, message_triage
  • task_evaluation
  • score_explanation, analytics_summary, analytics_psychometric, analytics_professionalism_rating
  • background_agent_chatter, background_agent_dm, scheduler_message_generation
  • embedding* request types
  • memory_store, memory_retrieval
Current high-value flows instrumented in code include:
  • end-user and agent chat completions
  • task evaluation and scoring explanation
  • analytics psychometric/professionalism evaluations
  • dynamic routing / triage / scheduler generation
  • background chatter and background agent DM generation
  • embedding generation for org knowledge
  • mem0 store/retrieval accounting

Mem0 accounting note

Mem0 usage is currently tracked as mem0_units for memory store/retrieval operations, with provider mem0. Mem0 events use the configured runtime LLM model label (typically gpt-4o-mini) for attribution consistency. Keep mem0_price_per_unit separate from token pricing to avoid accidental double counting.

Cost-control settings tied to usage

Two org-level simulation settings directly influence AI cost pressure and are persisted in simulation_settings:
  • simulation_policy.ambient.chatter_hourly_token_budget
    • soft/hard guardrail for background chatter token spend
    • validated as integer 2000..100000
  • psychometric_recalc_interval_messages
    • controls how frequently expensive psychometric LLM recalculation runs
    • validated as integer 1..50