Skip to main content

Core Objects

Heify revolves around two main objects: Configurations and Transcriptions. Understanding these objects is essential for working with the API effectively.

Configuration Object

A Configuration is a reusable template that defines how audio/video files should be processed. Think of it as a preset that you can apply to multiple transcription jobs.

Configuration Structure

AttributeTypeRequiredDescription
configuration_idstringAuto-generatedUnique identifier for the configuration (UUID format)
client_idstringAuto-assignedYour client identifier (linked to your API key)
tagstringYesDescriptive name for the configuration (max 255 characters)
vocabularyarray<string>NoCustom words/phrases to improve recognition accuracy
extraction_fieldsarray<object>NoStructured data fields to extract (max 20 fields)
webhooksobjectNoURLs for success/error notifications
summarybooleanNoGenerate a summary of the transcription (default: false)
summary_languagestringNoLanguage for summary generation (default: "df" - auto-detect). See Supported Languages.
analytics_languagestringNoLanguage for the Executive and Qualitative Analysis Report (default: "df" - auto-detect). See Supported Languages.
created_atstringAuto-generatedISO 8601 timestamp of creation

Example Configuration

{
  "configuration_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "client_id": "client-uuid",
  "tag": "Sales Call Analysis Q4",
  "vocabulary": ["blockchain", "cryptocurrency", "NFT"],
  "extraction_fields": [
    {
      "name": "customer_id",
      "type": "string",
      "description": "The customer ID mentioned in the conversation"
    },
    {
      "name": "purchase_amount",
      "type": "number",
      "description": "The total purchase amount discussed"
    }
  ],
  "webhooks": {
    "success_url": "https://mysandbox.com/webhooks/success",
    "error_url": "https://mysandbox.com/webhooks/error"
  },
  "summary": true,
  "summary_language": "en",
  "analytics_language": "en",
  "created_at": "2025-10-03T22:25:00.123456+00:00"
}

Extraction Fields

Define structured data to extract from transcriptions using AI.
AttributeTypeRequiredDescription
namestringYesField identifier (e.g., "ticket_id", "customer_name")
typestringYesData type: string, number, boolean, array
descriptionstringYesDetailed description to guide the AI extraction

Best Practices for Extraction Fields

Define a clear, limited set of possible values to improve consistency and accuracy.Example:
{
  "name": "sentiment",
  "type": "string",
  "description": "Classify the overall sentiment of the conversation. Must be one of: POSITIVE, NEGATIVE, or NEUTRAL"
}
This approach ensures the AI returns predictable, standardized values instead of varied descriptions.
Give clear descriptions and specific examples to guide the AI for more accurate results.Poor description:
{
  "name": "classification",
  "type": "string",
  "description": "Classifies the conversation"
}
Good description:
{
  "name": "issue",
  "type": "string",
  "description": "Classifies the conversation into a category: "[CATEGORY 1]", "[CATEGORY 2]", "[CATEGORY 3]". [CATEGORY 1] is ..., [CATEGORY 2] is ..., [CATEGORY 3] is ...  ."
}
The more context you provide, the better the extraction quality.
Best Practice: Provide clear, detailed descriptions for extraction fields. The more context you give, the more accurate the extraction will be.

Webhooks

Configure automatic notifications when transcription jobs complete or fail.
AttributeTypeDescription
success_urlstringURL to receive POST notifications on successful completion
error_urlstringURL to receive POST notifications on failure
Webhook Payload (Success):
{
  "transcription_id": "f0e9d8c7-b6a5-4321-fedc-ba9876543210",
  "status": "COMPLETED",
  "configuration_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "duration": 125.5,
  "completed_at": "2025-10-03T10:02:15.456Z"
}
Webhook Payload (Error):
{
  "transcription_id": "f0e9d8c7-b6a5-4321-fedc-ba9876543210",
  "status": "FAILED",
  "error": {
    "message": "Unsupported audio format",
    "code": 400
  }
}
Your webhook endpoint should respond with a 200 OK status to acknowledge receipt of the notification.

Transcription Object

A Transcription represents an individual audio or video processing job. Its structure changes based on the job’s current status.

Transcription Structure

AttributeTypeDescription
transcription_idstringUnique identifier for the transcription (UUID format)
statusstringCurrent status: PENDING, IN_PROGRESS, COMPLETED, FAILED
configuration_idstringID of the configuration used
configuration_tagstringTag of the configuration used
namestringCustom name for the transcription (can be null)
groupstringAudio group/phase (can be null). See available groups
durationnumberMedia duration in seconds
detailsobjectFull transcription results (only if COMPLETED)
errorobjectError information (only if FAILED)

Status Values

1

PENDING

The transcription job is queued and waiting to start processing.
2

IN_PROGRESS

The audio is currently being transcribed and analyzed.
3

COMPLETED

Transcription finished successfully. The details object contains all results.
4

FAILED

The transcription failed. The error object contains details about why.

Transcription Details (when COMPLETED)

When a transcription completes successfully, the details object includes:
AttributeTypeDescription
languagestringDetected language of the media
num_speakersnumberNumber of unique speakers identified
created_atstringTimestamp when transcription started
completed_atstringTimestamp when transcription finished
conversationobjectFull conversation with speaker-separated segments
summaryobjectGenerated summary (if enabled)
fieldsobjectExtracted structured data (if configured)

Conversation Structure

The conversation object contains speaker-separated segments:
{
  "conversation": {
    "segments": [
      {
        "text": "Hello, thank you for calling support.",
        "speaker": "SPEAKER_00"
      },
      {
        "text": "Hi, I'm having an issue with my account.",
        "speaker": "SPEAKER_01"
      }
    ]
  }
}
FieldTypeDescription
textstringTranscribed text for this segment
speakerstringSpeaker identifier (SPEAKER_00, SPEAKER_01, etc.)

Complete Transcription Example

{
  "transcription_id": "f0e9d8c7-b6a5-4321-fedc-ba9876543210",
  "status": "COMPLETED",
  "configuration_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "configuration_tag": "Sales Call Analysis Q4",
  "name": "client_call_2025_10_03.mp3",
  "group": "UNDER_REVIEW",
  "duration": 185.4,
  "details": {
    "language": "en",
    "num_speakers": 2,
    "created_at": "2025-10-03T10:00:00.123Z",
    "completed_at": "2025-10-03T10:03:45.789Z",
    "conversation": {
      "segments": [
        {
          "text": "Good morning, this is Sarah from sales.",
          "speaker": "SPEAKER_00"
        }
      ]
    },
    "summary": {
      "summary": "A sales call discussing product pricing and implementation timeline."
    },
    "fields": {
      "fields": [
        {
          "name": "customer_id",
          "value": "CUST-12345"
        },
        {
          "name": "purchase_amount",
          "value": 15000
        }
      ]
    }
  }
}

Groups

Use the group field to manage transcription group/phase:
Group ValueDescription
PENDING_REVIEWTranscription needs manual review
UNDER_REVIEWCurrently being reviewed
ARCHIVEDCompleted and archived
nullNo group assigned
Groups are managed using the /update-transcription-group endpoint. See Update Transcription Group for details.

Supported Languages

The following languages are supported for transcriptions, summaries (summary_language), and analytics reports (analytics_language).
How the default language (df) worksYou can use "df" for summary_language and analytics_language for automatic language detection, but the behavior differs:
  • For summaries (summary_language): The summary will be generated in the language detected in that specific audio file.
  • For analytics (analytics_language): The report will be generated in the majority language found across all audio files associated with the configuration.
  • A-G
  • H-P
  • R-Z
LanguageISO Code
Afrikaansaf
Albaniansq
Arabicar
Azerbaijaniaz
Basqueeu
Belarusianbe
Bengalibn
Bosnianbs
Bulgarianbg
Catalanca
Chinesezh
Croatianhr
Czechcs
Danishda
Dutchnl
Englishen
Estonianet
Finnishfi
Frenchfr
Galiciangl
Germande
Greekel
Gujaratigu

Best Practices

Use descriptive configuration tags that reflect their purpose:
  • Good: "Customer Support - Technical Issues"
  • Good: "Sales Calls - Q4 2025"
  • Bad: "Config 1"
Create configurations for common use cases and reuse them across multiple transcriptions. You can have up to 20 configurations.
Add industry-specific terms, product names, or acronyms to improve accuracy:
"vocabulary": ["Kubernetes", "API Gateway", "OAuth2"]
Extract structured data automatically instead of parsing transcripts manually:
  • Customer IDs
  • Order numbers
  • Dates and times
  • Monetary amounts
  • Yes/no answers

Next Steps