Data Frame - Data Analysis Library for Dart

Data Frame is a comprehensive data manipulation and analysis library for Dart. It provides powerful tools for working with structured data including DataFrames, Series, statistical analysis, mathematical operations, and flexible data I/O.

What's New in v1.2.0

Maintenance release with fixes and improvements to examples and docs

✅ All example programs verified to run end-to-end without errors
🧰 Example fixes: numeric-safe aggregations, joins, cum ops, URL/CSV handling
🧪 Improved stability of statistical and math examples
📖 README and CHANGELOG updated

Features

Core Data Structures

Series: One-dimensional labeled data structure for storing and manipulating arrays
DataFrame: Two-dimensional labeled data structure for working with tabular data

Data Manipulation

Filtering, sorting, grouping, and transformation
Joining and merging operations
Null value handling
Column and row operations

Statistical Analysis

Descriptive statistics
Hypothesis testing (t-tests, chi-square, ANOVA)
Correlation and regression analysis
Confidence intervals
Normality tests

Mathematical Operations

Element-wise operations
Aggregation functions
Rolling statistics
Cumulative operations
Mathematical functions (abs, sqrt, log, exp, trigonometric)

Data I/O

CSV file reading and writing
JSON file reading and writing
URL data fetching
Sample data generation

Installation

Add this to your package's pubspec.yaml file:

dependencies:
  data_frame: ^1.2.0

Or use the command line:

dart pub add data_frame

Quick Start

import 'package:data_frame/data_frame.dart';

void main() async {
  // Create a DataFrame
  final df = DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'salary': [50000, 60000, 70000],
  });

  // Basic operations
  print(df.head());
  print(df.describe());

  // Filtering
  final highEarners = df.where((row) => (row['salary'] as num) > 55000);
  print(highEarners);

  // Statistical analysis
  final ageStats = Statistics.descriptiveStats(
    Series<num>(df['age'].data.cast<num>())
  );
  print('Age statistics: $ageStats');

  // I/O operations
  await DataIO.toCsv(df, 'output.csv');
  final loaded = await DataIO.readCsv('output.csv');
  print('Loaded ${loaded.length} rows');
}

Core Classes

Series

A one-dimensional labeled array capable of holding any data type.

// Create from list
final series = Series<int>([1, 2, 3, 4, 5]);

// Create from map
final series2 = Series<String>.fromMap({'a': 'apple', 'b': 'banana'});

// Basic operations
print(series.sum());    // 15
print(series.mean());   // 3.0
print(series.max());    // 5

// Filtering and transformation
final filtered = series.where((x) => x > 3);  // [4, 5]
final doubled = series.map<int>((x) => x * 2); // [2, 4, 6, 8, 10]

DataFrame

A two-dimensional labeled data structure with columns of potentially different types.

// Create from map
final df = DataFrame({
  'name': ['Alice', 'Bob', 'Charlie'],
  'age': [25, 30, 35],
  'city': ['NYC', 'LA', 'Chicago'],
});

// Create from records
final records = [
  {'name': 'Alice', 'age': 25},
  {'name': 'Bob', 'age': 30},
];
final df2 = DataFrame.fromRecords(records);

// Access data
print(df['name'][0]);           // 'Alice'
print(df.iloc(1));             // {'name': 'Bob', 'age': 30, 'city': 'LA'}
print(df.shape);               // [3, 3]

// Operations
final adults = df.where((row) => (row['age'] as num) >= 30);
final sorted = df.sortBy(['age']);
final grouped = df.groupBy(['city']);

Statistical Functions

// Descriptive statistics
final data = Series<num>([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
final stats = Statistics.descriptiveStats(data);
print(stats); // count, mean, std, min, max, quartiles, etc.

// Correlation analysis
final x = Series<num>([1, 2, 3, 4, 5]);
final y = Series<num>([2, 4, 6, 8, 10]);
final corrResult = Statistics.correlationTest(x, y);
print('Correlation: ${corrResult['correlation']}');

// Linear regression
final regression = Statistics.linearRegression(x, y);
print('Slope: ${regression['slope']}');
print('R²: ${regression['r_squared']}');

// Hypothesis testing
final group1 = Series<num>([1, 2, 3, 4, 5]);
final group2 = Series<num>([2, 3, 4, 5, 6]);
final tTestResult = Statistics.tTest(group1, group2);
print('t-statistic: ${tTestResult['t_statistic']}');
print('p-value: ${tTestResult['p_value']}');

Data I/O Operations

// CSV operations
await DataIO.toCsv(df, 'data.csv');
final loaded = await DataIO.readCsv('data.csv');

// JSON operations
await DataIO.toJson(df, 'data.json');
final jsonData = await DataIO.readJson('data.json');

// URL data fetching
final urlData = await DataIO.readUrl(
  'https://example.com/data.csv',
  format: 'csv'
);

// Sample data generation
final sampleNumeric = DataUtils.createSampleNumeric(rows: 100, columns: 5);
final sampleMixed = DataUtils.createSampleMixed(rows: 50);
final timeSeries = DataUtils.createTimeSeries(days: 30);

Utility Functions

The DD class provides convenient factory methods:

// Create Series
final range = DD.range(0, 10, step: 2);        // [0, 2, 4, 6, 8]
final zeros = DD.zeros(5);                      // [0, 0, 0, 0, 0]
final ones = DD.ones(3);                        // [1, 1, 1]
final random = DD.randn(100);                   // Random normal distribution

// Create date ranges
final dates = DD.dateRange('2024-01-01', '2024-01-07');

// Data concatenation
final concatenated = DD.concat([series1, series2]);
final mergedDf = DD.merge(df1, df2, on: 'id');

Mathematical Operations

// Element-wise operations
final df1 = DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]});
final added = MathOps.add(df1, 10);           // Add 10 to all values
final multiplied = MathOps.multiply(df1, 2);   // Multiply all by 2

// Mathematical functions
final sqrted = MathOps.sqrt(df1);
final logged = MathOps.log(df1);
final absolute = MathOps.abs(df1);

// Rolling statistics
final rollingMean = MathOps.rollingMean(df1, 3);
final cumSum = MathOps.cumSum(df1);

// Correlation matrix
final corrMatrix = MathOps.corr(df1);

Advanced Features

Grouping and Aggregation

final df = DataFrame({
  'category': ['A', 'B', 'A', 'B', 'A'],
  'value': [1, 2, 3, 4, 5],
});

final groups = df.groupBy(['category']);
for (final entry in groups.entries) {
  print('Group ${entry.key}:');
  print(entry.value);
}

Joining DataFrames

final df1 = DataFrame({
  'id': [1, 2, 3],
  'name': ['Alice', 'Bob', 'Charlie'],
});

final df2 = DataFrame({
  'id': [1, 2, 4],
  'age': [25, 30, 35],
});

final joined = df1.join(df2, on: 'id', how: 'inner');

Data Cleaning

// Handle null values
final cleaned = df.dropna();                    // Remove rows with nulls
final filled = df.fillna(0);                    // Fill nulls with 0

// Data validation
final info = df.info();                         // Get DataFrame info
final summary = df.summary();                   // Get summary statistics

Performance Considerations

DaData is optimized for medium-sized datasets (up to millions of rows)
For very large datasets, consider processing in chunks
Use appropriate data types to minimize memory usage
Take advantage of lazy evaluation where possible

Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests to our GitHub repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Examples

📁 Comprehensive Examples Available

The example/ directory contains detailed, real-world examples demonstrating all aspects of the data_frame library:

🚀 Getting Started

main.dart - Complete overview with all major features
basic_operations.dart - DataFrame and Series fundamentals

📊 Advanced Analytics

statistical_analysis.dart - Statistical tests, correlation, regression
mathematical_operations.dart - Math functions, rolling statistics, financial calculations

🔧 Data Management

data_io_operations.dart - CSV, JSON, URL data loading with error handling
advanced_data_manipulation.dart - Grouping, joining, cleaning, time series

🎯 Practical Applications

sample_data_generation.dart - Generate test data, simulations, performance testing
real_world_analysis.dart - Complete business intelligence workflow

📖 Run the Examples

# Run the comprehensive overview
dart run example/main.dart

# Run specific examples
dart run example/statistical_analysis.dart
dart run example/real_world_analysis.dart

# See all available examples
ls example/

Each example includes:

✅ Complete, runnable code
📝 Detailed explanations and comments
📊 Sample data and realistic scenarios
💡 Best practices and performance tips
🧪 Multiple use cases and patterns

Perfect for learning, reference, and adapting to your own projects!

Testing

DaData comes with a comprehensive test suite covering all functionality:

# Run all tests
dart test

# Run with coverage
dart test --coverage=coverage

# Run specific test group
dart test -n "Series Tests"

Test Coverage: 87 tests covering 100% of public APIs

See TEST_COVERAGE.md for detailed coverage information.

Code Quality

# Run static analysis
dart analyze

# Format code
dart format .

Status: ✅ No analyzer issues, all tests passing

API Reference

For complete API documentation, visit our documentation site or generate docs locally with dart doc.

Data Frame - Data Analysis Library for Dart

What's New in v1.2.0

Features

Core Data Structures

Data Manipulation

Statistical Analysis

Mathematical Operations

Data I/O

Installation

Quick Start

Core Classes

Series

DataFrame

Statistical Functions

Data I/O Operations

Utility Functions

Mathematical Operations

Advanced Features

Grouping and Aggregation

Joining DataFrames

Data Cleaning

Performance Considerations

Contributing

License

Examples

🚀 Getting Started

📊 Advanced Analytics

🔧 Data Management

🎯 Practical Applications

📖 Run the Examples

Testing

Code Quality

API Reference

Libraries

data_frame package