Flutter Whisper API

A comprehensive Flutter package for seamless integration with OpenAI's Whisper API, providing speech-to-text conversion capabilities with built-in audio recording functionality.

✨ Features

  • 🎀 Built-in Audio Recording: Record audio directly from device microphone
  • πŸ”Š Real-time Amplitude Monitoring: Visual feedback during recording
  • 🌐 OpenAI Whisper API Integration: High-quality speech-to-text transcription
  • πŸ“± Cross-Platform Support: Works on iOS, Android, web, and desktop
  • πŸŽ›οΈ Configurable Audio Quality: Multiple quality presets for different use cases
  • πŸ”’ Permission Handling: Automatic microphone permission management
  • πŸ“ Multiple Output Formats: JSON, text, SRT, VTT, and verbose JSON
  • ⚑ Error Handling: Comprehensive exception handling with detailed error messages
  • 🎯 Easy Integration: Simple API with minimal setup required

πŸš€ Getting Started

Installation

Add this to your package's pubspec.yaml file:

dependencies:
  flutter_whisper_api: ^1.0.0

Then run:

flutter pub get

Setup

  1. Get OpenAI API Key:

    • Visit OpenAI Platform
    • Create a new API key
    • Keep your API key secure and never commit it to version control
  2. Configure Permissions:

    Android (android/app/src/main/AndroidManifest.xml):

    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <uses-permission android:name="android.permission.INTERNET" />
    

    iOS (ios/Runner/Info.plist):

    <key>NSMicrophoneUsageDescription</key>
    <string>This app needs microphone access to record audio for transcription</string>
    

πŸ“± Usage

Basic Usage

import 'package:flutter_whisper_api/flutter_whisper_api.dart';
import 'dart:io';

// Initialize the client with your API key
final client = WhisperClient(apiKey: 'your-openai-api-key');

// Initialize the recorder
final recorder = WhisperRecorder();

// Start recording
String recordingPath = await recorder.startRecording();

// Stop recording and get the file
File? audioFile = await recorder.stopRecording();

if (audioFile != null) {
  // Create transcription request
  final request = WhisperRequest(
    audioFile: audioFile,
    language: 'en', // Optional: specify language
  );

  // Transcribe the audio
  final response = await client.transcribe(request);
  
  print('Transcribed text: ${response.text}');
}

// Clean up
recorder.dispose();
client.dispose();

Advanced Usage

// Configure audio quality
await recorder.startRecording(
  fileName: 'my_recording',
  quality: WhisperAudioQuality.high,
);

// Monitor recording amplitude
while (recorder.isRecording) {
  final amplitude = await recorder.getAmplitude();
  print('Current amplitude: $amplitude');
  await Future.delayed(Duration(milliseconds: 100));
}

// Advanced transcription options
final request = WhisperRequest(
  audioFile: audioFile,
  model: 'whisper-1',
  language: 'es', // Spanish
  temperature: 0.3,
  responseFormat: WhisperResponseFormat.verboseJson,
  prompt: 'This is a medical consultation...',
);

final response = await client.transcribe(request);

// Access detailed response data
print('Language detected: ${response.language}');
print('Duration: ${response.duration} seconds');

// Access segments (if available)
if (response.segments != null) {
  for (final segment in response.segments!) {
    print('${segment.start}s - ${segment.end}s: ${segment.text}');
  }
}

Error Handling

try {
  final response = await client.transcribe(request);
  print('Success: ${response.text}');
} on WhisperAuthenticationException catch (e) {
  print('Authentication failed: ${e.message}');
} on WhisperNetworkException catch (e) {
  print('Network error: ${e.message}');
} on WhisperAudioException catch (e) {
  print('Audio file error: ${e.message}');
} on WhisperRecordingException catch (e) {
  print('Recording error: ${e.message}');
} on WhisperException catch (e) {
  print('General Whisper error: ${e.message}');
}

πŸŽ›οΈ Configuration Options

Audio Quality Presets

Quality Bit Rate Sample Rate Use Case
WhisperAudioQuality.low 64 kbps 16 kHz Voice notes, long recordings
WhisperAudioQuality.medium 128 kbps 44.1 kHz General purpose (default)
WhisperAudioQuality.high 256 kbps 44.1 kHz High-quality audio, music

Response Formats

  • WhisperResponseFormat.json - Simple JSON with text only
  • WhisperResponseFormat.text - Plain text response
  • WhisperResponseFormat.srt - SRT subtitle format
  • WhisperResponseFormat.vtt - WebVTT subtitle format
  • WhisperResponseFormat.verboseJson - Detailed JSON with timestamps

Supported Languages

The Whisper API supports 99+ languages. Some common ones:

  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • ru - Russian
  • ja - Japanese
  • ko - Korean
  • zh - Chinese

For automatic language detection, omit the language parameter.

πŸ“‹ API Reference

WhisperClient

WhisperClient({
  required String apiKey,
  String baseUrl = 'https://api.openai.com/v1',
  http.Client? httpClient,
})

WhisperRecorder

// Start recording
Future<String> startRecording({
  String? fileName,
  WhisperAudioQuality quality = WhisperAudioQuality.medium,
})

// Stop recording
Future<File?> stopRecording()

// Cancel recording
Future<void> cancelRecording()

// Pause/Resume (platform dependent)
Future<void> pauseRecording()
Future<void> resumeRecording()

// Monitor amplitude
Future<double?> getAmplitude()

// Permission handling
Future<bool> requestPermission()
Future<bool> hasPermission()

πŸ”§ Example App

Check out the complete example app in the /example folder that demonstrates:

  • API key configuration
  • Recording with visual feedback
  • Real-time amplitude monitoring
  • Transcription display
  • Error handling

To run the example:

cd example
flutter pub get
flutter run

🚨 Important Notes

API Costs

  • OpenAI charges for Whisper API usage
  • Current pricing: $0.006 per minute of audio
  • Monitor your usage on the OpenAI Platform

File Limitations

  • Maximum file size: 25 MB
  • Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
  • For longer audio, consider chunking

Security

  • Never hardcode API keys in your app
  • Use environment variables or secure storage
  • Consider server-side API calls for production apps

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

If you encounter any issues or have questions:

  1. Check the example app for implementation details
  2. Review the API documentation
  3. Open an issue on GitHub

πŸ™ Acknowledgments

  • OpenAI for the Whisper API
  • Flutter team for the amazing framework
  • Contributors and users of this package