flutter_whisper_api 1.0.0
flutter_whisper_api: ^1.0.0 copied to clipboard
A Flutter package for seamless integration with OpenAI's Whisper API for speech-to-text conversion with audio recording capabilities.
Flutter Whisper API #
A comprehensive Flutter package for seamless integration with OpenAI's Whisper API, providing speech-to-text conversion capabilities with built-in audio recording functionality.
β¨ Features #
- π€ Built-in Audio Recording: Record audio directly from device microphone
- π Real-time Amplitude Monitoring: Visual feedback during recording
- π OpenAI Whisper API Integration: High-quality speech-to-text transcription
- π± Cross-Platform Support: Works on iOS, Android, web, and desktop
- ποΈ Configurable Audio Quality: Multiple quality presets for different use cases
- π Permission Handling: Automatic microphone permission management
- π Multiple Output Formats: JSON, text, SRT, VTT, and verbose JSON
- β‘ Error Handling: Comprehensive exception handling with detailed error messages
- π― Easy Integration: Simple API with minimal setup required
π Getting Started #
Installation #
Add this to your package's pubspec.yaml file:
dependencies:
flutter_whisper_api: ^1.0.0
Then run:
flutter pub get
Setup #
-
Get OpenAI API Key:
- Visit OpenAI Platform
- Create a new API key
- Keep your API key secure and never commit it to version control
-
Configure Permissions:
Android (
android/app/src/main/AndroidManifest.xml):<uses-permission android:name="android.permission.RECORD_AUDIO" /> <uses-permission android:name="android.permission.INTERNET" />iOS (
ios/Runner/Info.plist):<key>NSMicrophoneUsageDescription</key> <string>This app needs microphone access to record audio for transcription</string>
π± Usage #
Basic Usage #
import 'package:flutter_whisper_api/flutter_whisper_api.dart';
import 'dart:io';
// Initialize the client with your API key
final client = WhisperClient(apiKey: 'your-openai-api-key');
// Initialize the recorder
final recorder = WhisperRecorder();
// Start recording
String recordingPath = await recorder.startRecording();
// Stop recording and get the file
File? audioFile = await recorder.stopRecording();
if (audioFile != null) {
// Create transcription request
final request = WhisperRequest(
audioFile: audioFile,
language: 'en', // Optional: specify language
);
// Transcribe the audio
final response = await client.transcribe(request);
print('Transcribed text: ${response.text}');
}
// Clean up
recorder.dispose();
client.dispose();
Advanced Usage #
// Configure audio quality
await recorder.startRecording(
fileName: 'my_recording',
quality: WhisperAudioQuality.high,
);
// Monitor recording amplitude
while (recorder.isRecording) {
final amplitude = await recorder.getAmplitude();
print('Current amplitude: $amplitude');
await Future.delayed(Duration(milliseconds: 100));
}
// Advanced transcription options
final request = WhisperRequest(
audioFile: audioFile,
model: 'whisper-1',
language: 'es', // Spanish
temperature: 0.3,
responseFormat: WhisperResponseFormat.verboseJson,
prompt: 'This is a medical consultation...',
);
final response = await client.transcribe(request);
// Access detailed response data
print('Language detected: ${response.language}');
print('Duration: ${response.duration} seconds');
// Access segments (if available)
if (response.segments != null) {
for (final segment in response.segments!) {
print('${segment.start}s - ${segment.end}s: ${segment.text}');
}
}
Error Handling #
try {
final response = await client.transcribe(request);
print('Success: ${response.text}');
} on WhisperAuthenticationException catch (e) {
print('Authentication failed: ${e.message}');
} on WhisperNetworkException catch (e) {
print('Network error: ${e.message}');
} on WhisperAudioException catch (e) {
print('Audio file error: ${e.message}');
} on WhisperRecordingException catch (e) {
print('Recording error: ${e.message}');
} on WhisperException catch (e) {
print('General Whisper error: ${e.message}');
}
ποΈ Configuration Options #
Audio Quality Presets #
| Quality | Bit Rate | Sample Rate | Use Case |
|---|---|---|---|
WhisperAudioQuality.low |
64 kbps | 16 kHz | Voice notes, long recordings |
WhisperAudioQuality.medium |
128 kbps | 44.1 kHz | General purpose (default) |
WhisperAudioQuality.high |
256 kbps | 44.1 kHz | High-quality audio, music |
Response Formats #
WhisperResponseFormat.json- Simple JSON with text onlyWhisperResponseFormat.text- Plain text responseWhisperResponseFormat.srt- SRT subtitle formatWhisperResponseFormat.vtt- WebVTT subtitle formatWhisperResponseFormat.verboseJson- Detailed JSON with timestamps
Supported Languages #
The Whisper API supports 99+ languages. Some common ones:
en- Englishes- Spanishfr- Frenchde- Germanit- Italianpt- Portugueseru- Russianja- Japaneseko- Koreanzh- Chinese
For automatic language detection, omit the language parameter.
π API Reference #
WhisperClient #
WhisperClient({
required String apiKey,
String baseUrl = 'https://api.openai.com/v1',
http.Client? httpClient,
})
WhisperRecorder #
// Start recording
Future<String> startRecording({
String? fileName,
WhisperAudioQuality quality = WhisperAudioQuality.medium,
})
// Stop recording
Future<File?> stopRecording()
// Cancel recording
Future<void> cancelRecording()
// Pause/Resume (platform dependent)
Future<void> pauseRecording()
Future<void> resumeRecording()
// Monitor amplitude
Future<double?> getAmplitude()
// Permission handling
Future<bool> requestPermission()
Future<bool> hasPermission()
π§ Example App #
Check out the complete example app in the /example folder that demonstrates:
- API key configuration
- Recording with visual feedback
- Real-time amplitude monitoring
- Transcription display
- Error handling
To run the example:
cd example
flutter pub get
flutter run
π¨ Important Notes #
API Costs #
- OpenAI charges for Whisper API usage
- Current pricing: $0.006 per minute of audio
- Monitor your usage on the OpenAI Platform
File Limitations #
- Maximum file size: 25 MB
- Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
- For longer audio, consider chunking
Security #
- Never hardcode API keys in your app
- Use environment variables or secure storage
- Consider server-side API calls for production apps
π€ Contributing #
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
π License #
This project is licensed under the MIT License - see the LICENSE file for details.
π Support #
If you encounter any issues or have questions:
- Check the example app for implementation details
- Review the API documentation
- Open an issue on GitHub