Flutter Whisper API

A comprehensive Flutter package for seamless integration with OpenAI's Whisper API, providing speech-to-text conversion capabilities with built-in audio recording functionality.

✨ Features

🎤 Built-in Audio Recording: Record audio directly from device microphone
🔊 Real-time Amplitude Monitoring: Visual feedback during recording
🌐 OpenAI Whisper API Integration: High-quality speech-to-text transcription
📱 Cross-Platform Support: Works on iOS, Android, web, and desktop
🎛️ Configurable Audio Quality: Multiple quality presets for different use cases
🔒 Permission Handling: Automatic microphone permission management
📝 Multiple Output Formats: JSON, text, SRT, VTT, and verbose JSON
⚡ Error Handling: Comprehensive exception handling with detailed error messages
🎯 Easy Integration: Simple API with minimal setup required

🚀 Getting Started

Installation

Add this to your package's pubspec.yaml file:

dependencies:
  flutter_whisper_api: ^1.0.0

Then run:

flutter pub get

Setup

Get OpenAI API Key:
- Visit OpenAI Platform
- Create a new API key
- Keep your API key secure and never commit it to version control

Configure Permissions:

Android (android/app/src/main/AndroidManifest.xml):

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />

iOS (ios/Runner/Info.plist):

<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access to record audio for transcription</string>

📱 Usage

Basic Usage

import 'package:flutter_whisper_api/flutter_whisper_api.dart';
import 'dart:io';

// Initialize the client with your API key
final client = WhisperClient(apiKey: 'your-openai-api-key');

// Initialize the recorder
final recorder = WhisperRecorder();

// Start recording
String recordingPath = await recorder.startRecording();

// Stop recording and get the file
File? audioFile = await recorder.stopRecording();

if (audioFile != null) {
  // Create transcription request
  final request = WhisperRequest(
    audioFile: audioFile,
    language: 'en', // Optional: specify language
  );

  // Transcribe the audio
  final response = await client.transcribe(request);
  
  print('Transcribed text: ${response.text}');
}

// Clean up
recorder.dispose();
client.dispose();

Advanced Usage

// Configure audio quality
await recorder.startRecording(
  fileName: 'my_recording',
  quality: WhisperAudioQuality.high,
);

// Monitor recording amplitude
while (recorder.isRecording) {
  final amplitude = await recorder.getAmplitude();
  print('Current amplitude: $amplitude');
  await Future.delayed(Duration(milliseconds: 100));
}

// Advanced transcription options
final request = WhisperRequest(
  audioFile: audioFile,
  model: 'whisper-1',
  language: 'es', // Spanish
  temperature: 0.3,
  responseFormat: WhisperResponseFormat.verboseJson,
  prompt: 'This is a medical consultation...',
);

final response = await client.transcribe(request);

// Access detailed response data
print('Language detected: ${response.language}');
print('Duration: ${response.duration} seconds');

// Access segments (if available)
if (response.segments != null) {
  for (final segment in response.segments!) {
    print('${segment.start}s - ${segment.end}s: ${segment.text}');
  }
}

Error Handling

try {
  final response = await client.transcribe(request);
  print('Success: ${response.text}');
} on WhisperAuthenticationException catch (e) {
  print('Authentication failed: ${e.message}');
} on WhisperNetworkException catch (e) {
  print('Network error: ${e.message}');
} on WhisperAudioException catch (e) {
  print('Audio file error: ${e.message}');
} on WhisperRecordingException catch (e) {
  print('Recording error: ${e.message}');
} on WhisperException catch (e) {
  print('General Whisper error: ${e.message}');
}

🎛️ Configuration Options

Audio Quality Presets

Quality	Bit Rate	Sample Rate	Use Case
`WhisperAudioQuality.low`	64 kbps	16 kHz	Voice notes, long recordings
`WhisperAudioQuality.medium`	128 kbps	44.1 kHz	General purpose (default)
`WhisperAudioQuality.high`	256 kbps	44.1 kHz	High-quality audio, music

Response Formats

WhisperResponseFormat.json - Simple JSON with text only
WhisperResponseFormat.text - Plain text response
WhisperResponseFormat.srt - SRT subtitle format
WhisperResponseFormat.vtt - WebVTT subtitle format
WhisperResponseFormat.verboseJson - Detailed JSON with timestamps

Supported Languages

The Whisper API supports 99+ languages. Some common ones:

en - English
es - Spanish
fr - French
de - German
it - Italian
pt - Portuguese
ru - Russian
ja - Japanese
ko - Korean
zh - Chinese

For automatic language detection, omit the language parameter.

📋 API Reference

WhisperClient

WhisperClient({
  required String apiKey,
  String baseUrl = 'https://api.openai.com/v1',
  http.Client? httpClient,
})

WhisperRecorder

// Start recording
Future<String> startRecording({
  String? fileName,
  WhisperAudioQuality quality = WhisperAudioQuality.medium,
})

// Stop recording
Future<File?> stopRecording()

// Cancel recording
Future<void> cancelRecording()

// Pause/Resume (platform dependent)
Future<void> pauseRecording()
Future<void> resumeRecording()

// Monitor amplitude
Future<double?> getAmplitude()

// Permission handling
Future<bool> requestPermission()
Future<bool> hasPermission()

🔧 Example App

Check out the complete example app in the /example folder that demonstrates:

API key configuration
Recording with visual feedback
Real-time amplitude monitoring
Transcription display
Error handling

To run the example:

cd example
flutter pub get
flutter run

🚨 Important Notes

API Costs

OpenAI charges for Whisper API usage
Current pricing: $0.006 per minute of audio
Monitor your usage on the OpenAI Platform

File Limitations

Maximum file size: 25 MB
Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
For longer audio, consider chunking

Security

Never hardcode API keys in your app
Use environment variables or secure storage
Consider server-side API calls for production apps

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

If you encounter any issues or have questions:

Check the example app for implementation details
Review the API documentation
Open an issue on GitHub

🙏 Acknowledgments

OpenAI for the Whisper API
Flutter team for the amazing framework
Contributors and users of this package