textured 0.1.0 copy "textured: ^0.1.0" to clipboard
textured: ^0.1.0 copied to clipboard

A Dart package to use LLMs to parse text into pre-defined data structures.

pipeline status coverage report

Textured #

A Dart package to use LLMs to parse natural language text into pre-defined data structures.

Overview #

Textured takes natural language descriptions (like tree or plant descriptions) and converts them into structured JSON data based on provided JSON schemas. This is accomplished by:

  1. Reading input text files containing natural language descriptions
  2. Reading JSON schemas that define the target data structure
  3. Using LLMs to intelligently map the text content to the schema fields
  4. Outputting structured JSON data

Features #

  • πŸ“– Input Text Reader: Reads text files containing natural language descriptions
  • πŸ” JSON Schema Reader: Parses JSON schemas to understand target data structures
  • 🎯 Prompt Generator: Creates LLM-optimized prompts with schema awareness and examples
  • πŸ€– LLM Integration: Supports local Ollama and cloud-based Google Gemini with extensible architecture
  • βœ… Schema Validation: Validates LLM responses against JSON schemas with detailed error reporting
  • πŸ› οΈ CLI Tool: Simple command-line interface for interactive text parsing

Installation #

Add this package to your pubspec.yaml:

dependencies:
  textured: ^0.1.0

Usage #

Quick Start with CLI #

The easiest way to try textured is using the CLI tool with your choice of LLM provider:

Option 1: Ollama (Local)

# Make sure Ollama is running
ollama serve

# Pull a model if you haven't already
ollama pull llama2

# Run the CLI with default Ollama provider
dart run example/textured_cli.dart test/data/tree_metadata_schema.json

Option 2: Google Gemini (Cloud)

# 1. Set up Google Cloud project and enable Vertex AI API
gcloud services enable aiplatform.googleapis.com

# 2. Authenticate (choose one method)
gcloud auth application-default login
# OR provide service account key with --gemini-key
# OR use API key with --gemini-api-key (alternative to OAuth)

# 3. Run the CLI with Gemini provider
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-project=your-gcp-project-id

# With service account key file
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-project=your-gcp-project-id \
  --gemini-key=path/to/service-account.json

Then type your text and press Ctrl+D when finished.


Example session:
```bash
$ echo "A large oak tree in Central Park, about 100 years old" | dart run example/textured_cli.dart test/data/tree_metadata_schema.json

🌟 Textured CLI - LLM Text Parser
═══════════════════════════════════
Schema: test/data/tree_metadata_schema.json
Model: llama2

πŸ“‹ Loading schema...
   βœ“ Loaded: Tree Metadata Schema
   βœ“ Properties: 32

πŸ€– Configuring LLM...
   βœ“ Connection successful

πŸ“€ Processing text...
   βœ“ Response received (8181ms)

πŸ“Š RESULTS
═══════════════════════════════════
βœ… Success!
βœ… Schema Validation: PASSED

πŸ“‹ Extracted Data:
{
  "name": "Large Oak Tree",
  "city": "New York",
  "state": "NY",
  "is_alive": true,
  ...
}

Programmatic Usage #

Reading Input Text

import 'package:textured/textured.dart';

final reader = InputTextReader();

// Async reading
final content = await reader.readFile('path/to/description.txt');

// Sync reading  
final content = reader.readFileSync('path/to/description.txt');

Reading JSON Schemas

final schemaReader = JsonSchemaReader();
final schema = await schemaReader.readSchema('path/to/schema.json');

print('Schema: ${schema.title}');
print('Properties: ${schema.properties.length}');
print('Required: ${schema.required}');

Generating LLM Prompts

final promptGenerator = PromptGenerator();
final prompt = promptGenerator.generatePrompt(
  inputText,
  schema,
  includeExamples: true,
  customInstructions: 'Extract data precisely from the text.',
);

// Save prompt for external testing
await promptGenerator.savePromptToFile(prompt, 'output/prompt.txt');

Calling LLMs with Validation

Option 1: Ollama (Local)

// Configure Ollama
final config = OllamaConfig.local(
  model: 'llama2',
  temperature: 0.3,
);

// Create LLM caller with validation
final baseCaller = OllamaLlmCaller(config: config);
final validatingCaller = ValidatingLlmCaller(
  delegate: baseCaller,
  schema: schema,
);

// Make the call
final response = await validatingCaller.callLlm(prompt);

if (response.success) {
  print('Extracted: ${response.content}');
  print('Valid: ${response.metadata?['validation']['valid']}');
} else {
  print('Error: ${response.error}');
}

Option 2: Google Gemini (Cloud)

// Configure Gemini
final config = GeminiConfig.standard(
  projectId: 'your-gcp-project-id',
  model: 'gemini-2.0-flash',
);

// Set up authentication
final authenticator = GcpAuthenticator();
await authenticator.authenticateWithServiceAccount('path/to/service-account.json');
// OR: await authenticator.authenticateWithApplicationDefault();

// Create LLM caller with validation
final baseCaller = GeminiLlmCaller(
  config: config,
  authenticator: authenticator,
);
final validatingCaller = ValidatingLlmCaller(
  delegate: baseCaller,
  schema: schema,
);

// Test connection first
final connectionOk = await baseCaller.testConnection();
if (!connectionOk) {
  throw Exception('Failed to connect to Gemini API');
}

// Make the call
final response = await validatingCaller.callLlm(prompt);

if (response.success) {
  print('Extracted: ${response.content}');
  print('Valid: ${response.metadata?['validation']['valid']}');
} else {
  print('Error: ${response.error}');
}

Development Status #

This package is feature-complete for its initial scope:

  • βœ… Input Text Reader - File I/O with comprehensive error handling
  • βœ… JSON Schema Reader - Full JSON Schema Draft 7 support
  • βœ… Prompt Generator - LangChain-compatible prompts with schema integration
  • βœ… LLM Callers - Ollama (local) and Google Gemini (cloud) integration with abstract interface
  • βœ… Authentication - Google Cloud service account and ADC support for Gemini
  • βœ… Schema Validation - Real-time validation of LLM responses
  • βœ… CLI Tool - Interactive command-line interface with provider selection
  • βœ… CI/CD Pipeline - Automated testing, coverage, and quality gates

Test Coverage: 89 passing tests with comprehensive edge case coverage for both LLM providers

Prerequisites #

For Ollama (Local):

  1. Ollama: Install from https://ollama.ai
  2. LLM Model: Pull a model like ollama pull llama2
  3. Running Service: Start with ollama serve

For Google Gemini (Cloud):

  1. Google Cloud Project: Create project with Vertex AI API enabled
  2. Authentication: Set up service account or gcloud auth application-default login
  3. Billing: Ensure project has billing enabled for API usage

Google Gemini Setup Guide #

1. Google Cloud Project Setup #

First, create and configure a Google Cloud project:

# Create a new project (or use existing)
gcloud projects create your-project-id
gcloud config set project your-project-id

# Enable required APIs
gcloud services enable aiplatform.googleapis.com

# Verify the API is enabled
gcloud services list --enabled --filter="name~aiplatform"

2. Authentication Setup #

Choose one of these authentication methods:

# Login with your Google account
gcloud auth application-default login

# Verify authentication
gcloud auth list
# Create service account
gcloud iam service-accounts create textured-service-account \
    --description="Service account for textured package" \
    --display-name="Textured Service Account"

# Grant Vertex AI access
gcloud projects add-iam-policy-binding your-project-id \
    --member="serviceAccount:textured-service-account@your-project-id.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

# Create and download key file
gcloud iam service-accounts keys create ~/textured-key.json \
    --iam-account=textured-service-account@your-project-id.iam.gserviceaccount.com

Option C: API Key (Simplest for development)

# Get an API key from Google AI Studio
# Visit: https://aistudio.google.com/app/apikey
# Create a new API key and copy it

# Method 1: Set environment variable
export GOOGLE_API_KEY="your-api-key-here"
# OR: Add to .env file
echo "GOOGLE_API_KEY=your-api-key-here" > .env

# Method 2: Use CLI parameter (see Usage Examples below)

3. Usage Examples #

CLI Usage

# Ollama (default provider)
dart run example/textured_cli.dart <schema_file>

# Gemini provider
dart run example/textured_cli.dart <schema_file> --provider=gemini --gemini-project=<project-id>

# With piped input
echo "Your text here" | dart run example/textured_cli.dart schema.json

# Help and usage information
dart run example/textured_cli.dart

Examples #

Ollama Examples

# Default model (llama2)
dart run example/textured_cli.dart test/data/tree_metadata_schema.json

# Different model
dart run example/textured_cli.dart test/data/tree_metadata_schema.json --model=mistral

# With debug output
dart run example/textured_cli.dart test/data/tree_metadata_schema.json --debug

Gemini Examples

# With API Key (environment variable)
export GOOGLE_API_KEY="your-api-key-here"
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini

# With API Key (CLI parameter)
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-api-key="your-api-key-here"

# With API Key (from file)
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-api-key-file=path/to/api-key.txt

# With Application Default Credentials
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-project=my-project-id

# With service account key
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-project=my-project-id \
  --gemini-key=path/to/service-account.json

# Different Gemini model
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-api-key="your-api-key-here" \
  --model=gemini-pro

# With debug for troubleshooting
dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --provider=gemini \
  --gemini-api-key="your-api-key-here" \
  --debug

CLI Options #

Option Description Default Example
--provider=<ollama|gemini> Select LLM provider ollama --provider=gemini
--model=<model_name> Model name llama2 (Ollama)
gemini-2.0-flash (Gemini)
--model=mistral
--gemini-project=<project_id> Google Cloud project ID (required for OAuth/ADC) - --gemini-project=my-project
--gemini-key=<key_file> Service account key file (optional for OAuth) - --gemini-key=~/key.json
--gemini-api-key=<api_key> Direct API key for Gemini (alternative to OAuth) - --gemini-api-key=AIza...
--gemini-api-key-file=<key_file> File containing API key for Gemini - --gemini-api-key-file=~/api.key
--save-prompt=<file> Save generated prompt to file - --save-prompt=prompt.txt
--save-output=<file> Save extracted output to file - --save-output=output.json
--debug Show detailed debug information false --debug

Example: Save prompt and output to files

dart run example/textured_cli.dart test/data/tree_metadata_schema.json \
  --save-prompt=prompt.txt \
  --save-output=output.json

CLI Output #

The CLI provides comprehensive feedback:

🌟 Textured CLI - LLM Text Parser
═══════════════════════════════════
Schema: test/data/tree_metadata_schema.json
Provider: gemini
Project: my-project-id
Model: gemini-2.0-flash

πŸ“‹ Loading schema...
   βœ“ Loaded: Tree Metadata Schema
   βœ“ Properties: 32
   βœ“ Required fields: 32

πŸ€– Configuring gemini LLM...
   πŸ” Authenticating with Google Cloud...
   βœ“ Authentication successful
   ⚑ Testing API connection...
   βœ“ Connection successful

πŸ“ Enter your text to parse:
   (Type your text and press Ctrl+D when finished)

> A large oak tree in Central Park

πŸ“€ Processing text (32 characters)...
   🎯 Generating LLM prompt...
   βœ“ Prompt generated (5282 characters)
   πŸš€ Calling LLM...
   βœ“ Response received (1835ms)

πŸ“Š RESULTS
═══════════════════════════════════
βœ… Success!

βœ… Schema Validation: PASSED

πŸ“‹ Extracted Data:
{
  "name": "Large Oak Tree",
  "city": "New York",
  "state": "NY",
  "is_alive": true,
  ...
}

πŸ“ˆ Metadata:
β€’ Model: gemini-2.0-flash
β€’ Response time: 1835ms
β€’ Content length: 245 characters

πŸŽ‰ Processing complete!

See example/README.md for detailed CLI documentation.

Contributing #

This project follows Test-Driven Development (TDD). When contributing:

  1. Write tests first
  2. Implement functionality to make tests pass
  3. Ensure all tests pass (dart test)
  4. Update developer notes and documentation
  5. Follow the established coding patterns

Architecture #

The package uses a modular, extensible architecture:

  • Abstract Interfaces: LlmCaller supports multiple LLM providers
  • Decorator Pattern: ValidatingLlmCaller adds validation to any LLM implementation
  • Configuration Management: Flexible setup for different deployment scenarios
  • Comprehensive Testing: Mock-based testing with 100% success rate

Performance #

Typical processing times with llama2 model:

  • Schema loading: ~50ms
  • Prompt generation: ~10ms
  • LLM response: 5-15 seconds (model dependent)
  • Validation: ~5ms

Documentation #

  • Developer Notes: Comprehensive technical documentation in docs/
  • API Documentation: Generate with dart doc
  • Examples: Working examples in example/
  • Academic Paper: Human-AI collaboration analysis in docs/paper.tex
0
likes
140
points
153
downloads

Publisher

verified publisherbandgap.ai

Weekly Downloads

A Dart package to use LLMs to parse text into pre-defined data structures.

Homepage
Repository (GitLab)
View/report issues

Topics

#llm #text-parsing #json-schema

Documentation

API reference

License

BSD-3-Clause (license)

Dependencies

dotenv, googleapis_auth, http

More

Packages that depend on textured