textured 0.1.0
textured: ^0.1.0 copied to clipboard
A Dart package to use LLMs to parse text into pre-defined data structures.
Textured #
A Dart package to use LLMs to parse natural language text into pre-defined data structures.
Overview #
Textured takes natural language descriptions (like tree or plant descriptions) and converts them into structured JSON data based on provided JSON schemas. This is accomplished by:
- Reading input text files containing natural language descriptions
- Reading JSON schemas that define the target data structure
- Using LLMs to intelligently map the text content to the schema fields
- Outputting structured JSON data
Features #
- π Input Text Reader: Reads text files containing natural language descriptions
- π JSON Schema Reader: Parses JSON schemas to understand target data structures
- π― Prompt Generator: Creates LLM-optimized prompts with schema awareness and examples
- π€ LLM Integration: Supports local Ollama and cloud-based Google Gemini with extensible architecture
- β Schema Validation: Validates LLM responses against JSON schemas with detailed error reporting
- π οΈ CLI Tool: Simple command-line interface for interactive text parsing
Installation #
Add this package to your pubspec.yaml
:
dependencies:
textured: ^0.1.0
Usage #
Quick Start with CLI #
The easiest way to try textured is using the CLI tool with your choice of LLM provider:
Option 1: Ollama (Local)
# Make sure Ollama is running
ollama serve
# Pull a model if you haven't already
ollama pull llama2
# Run the CLI with default Ollama provider
dart run example/textured_cli.dart test/data/tree_metadata_schema.json
Option 2: Google Gemini (Cloud)
# 1. Set up Google Cloud project and enable Vertex AI API
gcloud services enable aiplatform.googleapis.com
# 2. Authenticate (choose one method)
gcloud auth application-default login
# OR provide service account key with --gemini-key
# OR use API key with --gemini-api-key (alternative to OAuth)
# 3. Run the CLI with Gemini provider
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-project=your-gcp-project-id
# With service account key file
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-project=your-gcp-project-id \
--gemini-key=path/to/service-account.json
Then type your text and press Ctrl+D when finished.
Example session:
```bash
$ echo "A large oak tree in Central Park, about 100 years old" | dart run example/textured_cli.dart test/data/tree_metadata_schema.json
π Textured CLI - LLM Text Parser
βββββββββββββββββββββββββββββββββββ
Schema: test/data/tree_metadata_schema.json
Model: llama2
π Loading schema...
β Loaded: Tree Metadata Schema
β Properties: 32
π€ Configuring LLM...
β Connection successful
π€ Processing text...
β Response received (8181ms)
π RESULTS
βββββββββββββββββββββββββββββββββββ
β
Success!
β
Schema Validation: PASSED
π Extracted Data:
{
"name": "Large Oak Tree",
"city": "New York",
"state": "NY",
"is_alive": true,
...
}
Programmatic Usage #
Reading Input Text
import 'package:textured/textured.dart';
final reader = InputTextReader();
// Async reading
final content = await reader.readFile('path/to/description.txt');
// Sync reading
final content = reader.readFileSync('path/to/description.txt');
Reading JSON Schemas
final schemaReader = JsonSchemaReader();
final schema = await schemaReader.readSchema('path/to/schema.json');
print('Schema: ${schema.title}');
print('Properties: ${schema.properties.length}');
print('Required: ${schema.required}');
Generating LLM Prompts
final promptGenerator = PromptGenerator();
final prompt = promptGenerator.generatePrompt(
inputText,
schema,
includeExamples: true,
customInstructions: 'Extract data precisely from the text.',
);
// Save prompt for external testing
await promptGenerator.savePromptToFile(prompt, 'output/prompt.txt');
Calling LLMs with Validation
Option 1: Ollama (Local)
// Configure Ollama
final config = OllamaConfig.local(
model: 'llama2',
temperature: 0.3,
);
// Create LLM caller with validation
final baseCaller = OllamaLlmCaller(config: config);
final validatingCaller = ValidatingLlmCaller(
delegate: baseCaller,
schema: schema,
);
// Make the call
final response = await validatingCaller.callLlm(prompt);
if (response.success) {
print('Extracted: ${response.content}');
print('Valid: ${response.metadata?['validation']['valid']}');
} else {
print('Error: ${response.error}');
}
Option 2: Google Gemini (Cloud)
// Configure Gemini
final config = GeminiConfig.standard(
projectId: 'your-gcp-project-id',
model: 'gemini-2.0-flash',
);
// Set up authentication
final authenticator = GcpAuthenticator();
await authenticator.authenticateWithServiceAccount('path/to/service-account.json');
// OR: await authenticator.authenticateWithApplicationDefault();
// Create LLM caller with validation
final baseCaller = GeminiLlmCaller(
config: config,
authenticator: authenticator,
);
final validatingCaller = ValidatingLlmCaller(
delegate: baseCaller,
schema: schema,
);
// Test connection first
final connectionOk = await baseCaller.testConnection();
if (!connectionOk) {
throw Exception('Failed to connect to Gemini API');
}
// Make the call
final response = await validatingCaller.callLlm(prompt);
if (response.success) {
print('Extracted: ${response.content}');
print('Valid: ${response.metadata?['validation']['valid']}');
} else {
print('Error: ${response.error}');
}
Development Status #
This package is feature-complete for its initial scope:
- β Input Text Reader - File I/O with comprehensive error handling
- β JSON Schema Reader - Full JSON Schema Draft 7 support
- β Prompt Generator - LangChain-compatible prompts with schema integration
- β LLM Callers - Ollama (local) and Google Gemini (cloud) integration with abstract interface
- β Authentication - Google Cloud service account and ADC support for Gemini
- β Schema Validation - Real-time validation of LLM responses
- β CLI Tool - Interactive command-line interface with provider selection
- β CI/CD Pipeline - Automated testing, coverage, and quality gates
Test Coverage: 89 passing tests with comprehensive edge case coverage for both LLM providers
Prerequisites #
For Ollama (Local):
- Ollama: Install from https://ollama.ai
- LLM Model: Pull a model like
ollama pull llama2
- Running Service: Start with
ollama serve
For Google Gemini (Cloud):
- Google Cloud Project: Create project with Vertex AI API enabled
- Authentication: Set up service account or
gcloud auth application-default login
- Billing: Ensure project has billing enabled for API usage
Google Gemini Setup Guide #
1. Google Cloud Project Setup #
First, create and configure a Google Cloud project:
# Create a new project (or use existing)
gcloud projects create your-project-id
gcloud config set project your-project-id
# Enable required APIs
gcloud services enable aiplatform.googleapis.com
# Verify the API is enabled
gcloud services list --enabled --filter="name~aiplatform"
2. Authentication Setup #
Choose one of these authentication methods:
Option A: Application Default Credentials (Recommended for development)
# Login with your Google account
gcloud auth application-default login
# Verify authentication
gcloud auth list
Option B: Service Account (Recommended for production)
# Create service account
gcloud iam service-accounts create textured-service-account \
--description="Service account for textured package" \
--display-name="Textured Service Account"
# Grant Vertex AI access
gcloud projects add-iam-policy-binding your-project-id \
--member="serviceAccount:textured-service-account@your-project-id.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# Create and download key file
gcloud iam service-accounts keys create ~/textured-key.json \
--iam-account=textured-service-account@your-project-id.iam.gserviceaccount.com
Option C: API Key (Simplest for development)
# Get an API key from Google AI Studio
# Visit: https://aistudio.google.com/app/apikey
# Create a new API key and copy it
# Method 1: Set environment variable
export GOOGLE_API_KEY="your-api-key-here"
# OR: Add to .env file
echo "GOOGLE_API_KEY=your-api-key-here" > .env
# Method 2: Use CLI parameter (see Usage Examples below)
3. Usage Examples #
CLI Usage
# Ollama (default provider)
dart run example/textured_cli.dart <schema_file>
# Gemini provider
dart run example/textured_cli.dart <schema_file> --provider=gemini --gemini-project=<project-id>
# With piped input
echo "Your text here" | dart run example/textured_cli.dart schema.json
# Help and usage information
dart run example/textured_cli.dart
Examples #
Ollama Examples
# Default model (llama2)
dart run example/textured_cli.dart test/data/tree_metadata_schema.json
# Different model
dart run example/textured_cli.dart test/data/tree_metadata_schema.json --model=mistral
# With debug output
dart run example/textured_cli.dart test/data/tree_metadata_schema.json --debug
Gemini Examples
# With API Key (environment variable)
export GOOGLE_API_KEY="your-api-key-here"
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini
# With API Key (CLI parameter)
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-api-key="your-api-key-here"
# With API Key (from file)
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-api-key-file=path/to/api-key.txt
# With Application Default Credentials
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-project=my-project-id
# With service account key
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-project=my-project-id \
--gemini-key=path/to/service-account.json
# Different Gemini model
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-api-key="your-api-key-here" \
--model=gemini-pro
# With debug for troubleshooting
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--provider=gemini \
--gemini-api-key="your-api-key-here" \
--debug
CLI Options #
Option | Description | Default | Example |
---|---|---|---|
--provider=<ollama|gemini> |
Select LLM provider | ollama |
--provider=gemini |
--model=<model_name> |
Model name | llama2 (Ollama)gemini-2.0-flash (Gemini) |
--model=mistral |
--gemini-project=<project_id> |
Google Cloud project ID (required for OAuth/ADC) | - | --gemini-project=my-project |
--gemini-key=<key_file> |
Service account key file (optional for OAuth) | - | --gemini-key=~/key.json |
--gemini-api-key=<api_key> |
Direct API key for Gemini (alternative to OAuth) | - | --gemini-api-key=AIza... |
--gemini-api-key-file=<key_file> |
File containing API key for Gemini | - | --gemini-api-key-file=~/api.key |
--save-prompt=<file> |
Save generated prompt to file | - | --save-prompt=prompt.txt |
--save-output=<file> |
Save extracted output to file | - | --save-output=output.json |
--debug |
Show detailed debug information | false |
--debug |
Example: Save prompt and output to files
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
--save-prompt=prompt.txt \
--save-output=output.json
CLI Output #
The CLI provides comprehensive feedback:
π Textured CLI - LLM Text Parser
βββββββββββββββββββββββββββββββββββ
Schema: test/data/tree_metadata_schema.json
Provider: gemini
Project: my-project-id
Model: gemini-2.0-flash
π Loading schema...
β Loaded: Tree Metadata Schema
β Properties: 32
β Required fields: 32
π€ Configuring gemini LLM...
π Authenticating with Google Cloud...
β Authentication successful
β‘ Testing API connection...
β Connection successful
π Enter your text to parse:
(Type your text and press Ctrl+D when finished)
> A large oak tree in Central Park
π€ Processing text (32 characters)...
π― Generating LLM prompt...
β Prompt generated (5282 characters)
π Calling LLM...
β Response received (1835ms)
π RESULTS
βββββββββββββββββββββββββββββββββββ
β
Success!
β
Schema Validation: PASSED
π Extracted Data:
{
"name": "Large Oak Tree",
"city": "New York",
"state": "NY",
"is_alive": true,
...
}
π Metadata:
β’ Model: gemini-2.0-flash
β’ Response time: 1835ms
β’ Content length: 245 characters
π Processing complete!
See example/README.md
for detailed CLI documentation.
Contributing #
This project follows Test-Driven Development (TDD). When contributing:
- Write tests first
- Implement functionality to make tests pass
- Ensure all tests pass (
dart test
) - Update developer notes and documentation
- Follow the established coding patterns
Architecture #
The package uses a modular, extensible architecture:
- Abstract Interfaces:
LlmCaller
supports multiple LLM providers - Decorator Pattern:
ValidatingLlmCaller
adds validation to any LLM implementation - Configuration Management: Flexible setup for different deployment scenarios
- Comprehensive Testing: Mock-based testing with 100% success rate
Performance #
Typical processing times with llama2
model:
- Schema loading: ~50ms
- Prompt generation: ~10ms
- LLM response: 5-15 seconds (model dependent)
- Validation: ~5ms
Documentation #
- Developer Notes: Comprehensive technical documentation in
docs/
- API Documentation: Generate with
dart doc
- Examples: Working examples in
example/
- Academic Paper: Human-AI collaboration analysis in
docs/paper.tex