data_frame 1.2.0 copy "data_frame: ^1.2.0" to clipboard
data_frame: ^1.2.0 copied to clipboard

A comprehensive data manipulation and analysis library for Dart, inspired by pandas. Features DataFrame, Series, statistical analysis, and data I/O operations.

Data Frame - Data Analysis Library for Dart #

Dart Tests Coverage License Code Quality Version

Data Frame is a comprehensive data manipulation and analysis library for Dart. It provides powerful tools for working with structured data including DataFrames, Series, statistical analysis, mathematical operations, and flexible data I/O.

What's New in v1.2.0 #

Maintenance release with fixes and improvements to examples and docs

  • βœ… All example programs verified to run end-to-end without errors
  • 🧰 Example fixes: numeric-safe aggregations, joins, cum ops, URL/CSV handling
  • πŸ§ͺ Improved stability of statistical and math examples
  • πŸ“– README and CHANGELOG updated

Features #

Core Data Structures #

  • Series: One-dimensional labeled data structure for storing and manipulating arrays
  • DataFrame: Two-dimensional labeled data structure for working with tabular data

Data Manipulation #

  • Filtering, sorting, grouping, and transformation
  • Joining and merging operations
  • Null value handling
  • Column and row operations

Statistical Analysis #

  • Descriptive statistics
  • Hypothesis testing (t-tests, chi-square, ANOVA)
  • Correlation and regression analysis
  • Confidence intervals
  • Normality tests

Mathematical Operations #

  • Element-wise operations
  • Aggregation functions
  • Rolling statistics
  • Cumulative operations
  • Mathematical functions (abs, sqrt, log, exp, trigonometric)

Data I/O #

  • CSV file reading and writing
  • JSON file reading and writing
  • URL data fetching
  • Sample data generation

Installation #

Add this to your package's pubspec.yaml file:

dependencies:
  data_frame: ^1.2.0

Or use the command line:

dart pub add data_frame

Quick Start #

import 'package:data_frame/data_frame.dart';

void main() async {
  // Create a DataFrame
  final df = DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'salary': [50000, 60000, 70000],
  });

  // Basic operations
  print(df.head());
  print(df.describe());

  // Filtering
  final highEarners = df.where((row) => (row['salary'] as num) > 55000);
  print(highEarners);

  // Statistical analysis
  final ageStats = Statistics.descriptiveStats(
    Series<num>(df['age'].data.cast<num>())
  );
  print('Age statistics: $ageStats');

  // I/O operations
  await DataIO.toCsv(df, 'output.csv');
  final loaded = await DataIO.readCsv('output.csv');
  print('Loaded ${loaded.length} rows');
}

Core Classes #

Series #

A one-dimensional labeled array capable of holding any data type.

// Create from list
final series = Series<int>([1, 2, 3, 4, 5]);

// Create from map
final series2 = Series<String>.fromMap({'a': 'apple', 'b': 'banana'});

// Basic operations
print(series.sum());    // 15
print(series.mean());   // 3.0
print(series.max());    // 5

// Filtering and transformation
final filtered = series.where((x) => x > 3);  // [4, 5]
final doubled = series.map<int>((x) => x * 2); // [2, 4, 6, 8, 10]

DataFrame #

A two-dimensional labeled data structure with columns of potentially different types.

// Create from map
final df = DataFrame({
  'name': ['Alice', 'Bob', 'Charlie'],
  'age': [25, 30, 35],
  'city': ['NYC', 'LA', 'Chicago'],
});

// Create from records
final records = [
  {'name': 'Alice', 'age': 25},
  {'name': 'Bob', 'age': 30},
];
final df2 = DataFrame.fromRecords(records);

// Access data
print(df['name'][0]);           // 'Alice'
print(df.iloc(1));             // {'name': 'Bob', 'age': 30, 'city': 'LA'}
print(df.shape);               // [3, 3]

// Operations
final adults = df.where((row) => (row['age'] as num) >= 30);
final sorted = df.sortBy(['age']);
final grouped = df.groupBy(['city']);

Statistical Functions #

// Descriptive statistics
final data = Series<num>([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
final stats = Statistics.descriptiveStats(data);
print(stats); // count, mean, std, min, max, quartiles, etc.

// Correlation analysis
final x = Series<num>([1, 2, 3, 4, 5]);
final y = Series<num>([2, 4, 6, 8, 10]);
final corrResult = Statistics.correlationTest(x, y);
print('Correlation: ${corrResult['correlation']}');

// Linear regression
final regression = Statistics.linearRegression(x, y);
print('Slope: ${regression['slope']}');
print('RΒ²: ${regression['r_squared']}');

// Hypothesis testing
final group1 = Series<num>([1, 2, 3, 4, 5]);
final group2 = Series<num>([2, 3, 4, 5, 6]);
final tTestResult = Statistics.tTest(group1, group2);
print('t-statistic: ${tTestResult['t_statistic']}');
print('p-value: ${tTestResult['p_value']}');

Data I/O Operations #

// CSV operations
await DataIO.toCsv(df, 'data.csv');
final loaded = await DataIO.readCsv('data.csv');

// JSON operations
await DataIO.toJson(df, 'data.json');
final jsonData = await DataIO.readJson('data.json');

// URL data fetching
final urlData = await DataIO.readUrl(
  'https://example.com/data.csv',
  format: 'csv'
);

// Sample data generation
final sampleNumeric = DataUtils.createSampleNumeric(rows: 100, columns: 5);
final sampleMixed = DataUtils.createSampleMixed(rows: 50);
final timeSeries = DataUtils.createTimeSeries(days: 30);

Utility Functions #

The DD class provides convenient factory methods:

// Create Series
final range = DD.range(0, 10, step: 2);        // [0, 2, 4, 6, 8]
final zeros = DD.zeros(5);                      // [0, 0, 0, 0, 0]
final ones = DD.ones(3);                        // [1, 1, 1]
final random = DD.randn(100);                   // Random normal distribution

// Create date ranges
final dates = DD.dateRange('2024-01-01', '2024-01-07');

// Data concatenation
final concatenated = DD.concat([series1, series2]);
final mergedDf = DD.merge(df1, df2, on: 'id');

Mathematical Operations #

// Element-wise operations
final df1 = DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]});
final added = MathOps.add(df1, 10);           // Add 10 to all values
final multiplied = MathOps.multiply(df1, 2);   // Multiply all by 2

// Mathematical functions
final sqrted = MathOps.sqrt(df1);
final logged = MathOps.log(df1);
final absolute = MathOps.abs(df1);

// Rolling statistics
final rollingMean = MathOps.rollingMean(df1, 3);
final cumSum = MathOps.cumSum(df1);

// Correlation matrix
final corrMatrix = MathOps.corr(df1);

Advanced Features #

Grouping and Aggregation #

final df = DataFrame({
  'category': ['A', 'B', 'A', 'B', 'A'],
  'value': [1, 2, 3, 4, 5],
});

final groups = df.groupBy(['category']);
for (final entry in groups.entries) {
  print('Group ${entry.key}:');
  print(entry.value);
}

Joining DataFrames #

final df1 = DataFrame({
  'id': [1, 2, 3],
  'name': ['Alice', 'Bob', 'Charlie'],
});

final df2 = DataFrame({
  'id': [1, 2, 4],
  'age': [25, 30, 35],
});

final joined = df1.join(df2, on: 'id', how: 'inner');

Data Cleaning #

// Handle null values
final cleaned = df.dropna();                    // Remove rows with nulls
final filled = df.fillna(0);                    // Fill nulls with 0

// Data validation
final info = df.info();                         // Get DataFrame info
final summary = df.summary();                   // Get summary statistics

Performance Considerations #

  • DaData is optimized for medium-sized datasets (up to millions of rows)
  • For very large datasets, consider processing in chunks
  • Use appropriate data types to minimize memory usage
  • Take advantage of lazy evaluation where possible

Contributing #

Contributions are welcome! Please read our contributing guidelines and submit pull requests to our GitHub repository.

License #

This project is licensed under the MIT License - see the LICENSE file for details.

Examples #

πŸ“ Comprehensive Examples Available

The example/ directory contains detailed, real-world examples demonstrating all aspects of the data_frame library:

πŸš€ Getting Started #

πŸ“Š Advanced Analytics #

πŸ”§ Data Management #

🎯 Practical Applications #

πŸ“– Run the Examples #

# Run the comprehensive overview
dart run example/main.dart

# Run specific examples
dart run example/statistical_analysis.dart
dart run example/real_world_analysis.dart

# See all available examples
ls example/

Each example includes:

  • βœ… Complete, runnable code
  • πŸ“ Detailed explanations and comments
  • πŸ“Š Sample data and realistic scenarios
  • πŸ’‘ Best practices and performance tips
  • πŸ§ͺ Multiple use cases and patterns

Perfect for learning, reference, and adapting to your own projects!

Testing #

DaData comes with a comprehensive test suite covering all functionality:

# Run all tests
dart test

# Run with coverage
dart test --coverage=coverage

# Run specific test group
dart test -n "Series Tests"

Test Coverage: 87 tests covering 100% of public APIs

See TEST_COVERAGE.md for detailed coverage information.

Code Quality #

# Run static analysis
dart analyze

# Format code
dart format .

Status: βœ… No analyzer issues, all tests passing

API Reference #

For complete API documentation, visit our documentation site or generate docs locally with dart doc.

2
likes
160
points
242
downloads

Publisher

unverified uploader

Weekly Downloads

A comprehensive data manipulation and analysis library for Dart, inspired by pandas. Features DataFrame, Series, statistical analysis, and data I/O operations.

Repository (GitHub)
View/report issues

Documentation

API reference

License

MIT (license)

Dependencies

collection, csv, intl, meta

More

Packages that depend on data_frame