data_frame 1.2.0
data_frame: ^1.2.0 copied to clipboard
A comprehensive data manipulation and analysis library for Dart, inspired by pandas. Features DataFrame, Series, statistical analysis, and data I/O operations.
Data Frame - Data Analysis Library for Dart #
Data Frame is a comprehensive data manipulation and analysis library for Dart. It provides powerful tools for working with structured data including DataFrames, Series, statistical analysis, mathematical operations, and flexible data I/O.
What's New in v1.2.0 #
Maintenance release with fixes and improvements to examples and docs
- β All example programs verified to run end-to-end without errors
- π§° Example fixes: numeric-safe aggregations, joins, cum ops, URL/CSV handling
- π§ͺ Improved stability of statistical and math examples
- π README and CHANGELOG updated
Features #
Core Data Structures #
- Series: One-dimensional labeled data structure for storing and manipulating arrays
- DataFrame: Two-dimensional labeled data structure for working with tabular data
Data Manipulation #
- Filtering, sorting, grouping, and transformation
- Joining and merging operations
- Null value handling
- Column and row operations
Statistical Analysis #
- Descriptive statistics
- Hypothesis testing (t-tests, chi-square, ANOVA)
- Correlation and regression analysis
- Confidence intervals
- Normality tests
Mathematical Operations #
- Element-wise operations
- Aggregation functions
- Rolling statistics
- Cumulative operations
- Mathematical functions (abs, sqrt, log, exp, trigonometric)
Data I/O #
- CSV file reading and writing
- JSON file reading and writing
- URL data fetching
- Sample data generation
Installation #
Add this to your package's pubspec.yaml file:
dependencies:
data_frame: ^1.2.0
Or use the command line:
dart pub add data_frame
Quick Start #
import 'package:data_frame/data_frame.dart';
void main() async {
// Create a DataFrame
final df = DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'salary': [50000, 60000, 70000],
});
// Basic operations
print(df.head());
print(df.describe());
// Filtering
final highEarners = df.where((row) => (row['salary'] as num) > 55000);
print(highEarners);
// Statistical analysis
final ageStats = Statistics.descriptiveStats(
Series<num>(df['age'].data.cast<num>())
);
print('Age statistics: $ageStats');
// I/O operations
await DataIO.toCsv(df, 'output.csv');
final loaded = await DataIO.readCsv('output.csv');
print('Loaded ${loaded.length} rows');
}
Core Classes #
Series #
A one-dimensional labeled array capable of holding any data type.
// Create from list
final series = Series<int>([1, 2, 3, 4, 5]);
// Create from map
final series2 = Series<String>.fromMap({'a': 'apple', 'b': 'banana'});
// Basic operations
print(series.sum()); // 15
print(series.mean()); // 3.0
print(series.max()); // 5
// Filtering and transformation
final filtered = series.where((x) => x > 3); // [4, 5]
final doubled = series.map<int>((x) => x * 2); // [2, 4, 6, 8, 10]
DataFrame #
A two-dimensional labeled data structure with columns of potentially different types.
// Create from map
final df = DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['NYC', 'LA', 'Chicago'],
});
// Create from records
final records = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30},
];
final df2 = DataFrame.fromRecords(records);
// Access data
print(df['name'][0]); // 'Alice'
print(df.iloc(1)); // {'name': 'Bob', 'age': 30, 'city': 'LA'}
print(df.shape); // [3, 3]
// Operations
final adults = df.where((row) => (row['age'] as num) >= 30);
final sorted = df.sortBy(['age']);
final grouped = df.groupBy(['city']);
Statistical Functions #
// Descriptive statistics
final data = Series<num>([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
final stats = Statistics.descriptiveStats(data);
print(stats); // count, mean, std, min, max, quartiles, etc.
// Correlation analysis
final x = Series<num>([1, 2, 3, 4, 5]);
final y = Series<num>([2, 4, 6, 8, 10]);
final corrResult = Statistics.correlationTest(x, y);
print('Correlation: ${corrResult['correlation']}');
// Linear regression
final regression = Statistics.linearRegression(x, y);
print('Slope: ${regression['slope']}');
print('RΒ²: ${regression['r_squared']}');
// Hypothesis testing
final group1 = Series<num>([1, 2, 3, 4, 5]);
final group2 = Series<num>([2, 3, 4, 5, 6]);
final tTestResult = Statistics.tTest(group1, group2);
print('t-statistic: ${tTestResult['t_statistic']}');
print('p-value: ${tTestResult['p_value']}');
Data I/O Operations #
// CSV operations
await DataIO.toCsv(df, 'data.csv');
final loaded = await DataIO.readCsv('data.csv');
// JSON operations
await DataIO.toJson(df, 'data.json');
final jsonData = await DataIO.readJson('data.json');
// URL data fetching
final urlData = await DataIO.readUrl(
'https://example.com/data.csv',
format: 'csv'
);
// Sample data generation
final sampleNumeric = DataUtils.createSampleNumeric(rows: 100, columns: 5);
final sampleMixed = DataUtils.createSampleMixed(rows: 50);
final timeSeries = DataUtils.createTimeSeries(days: 30);
Utility Functions #
The DD class provides convenient factory methods:
// Create Series
final range = DD.range(0, 10, step: 2); // [0, 2, 4, 6, 8]
final zeros = DD.zeros(5); // [0, 0, 0, 0, 0]
final ones = DD.ones(3); // [1, 1, 1]
final random = DD.randn(100); // Random normal distribution
// Create date ranges
final dates = DD.dateRange('2024-01-01', '2024-01-07');
// Data concatenation
final concatenated = DD.concat([series1, series2]);
final mergedDf = DD.merge(df1, df2, on: 'id');
Mathematical Operations #
// Element-wise operations
final df1 = DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]});
final added = MathOps.add(df1, 10); // Add 10 to all values
final multiplied = MathOps.multiply(df1, 2); // Multiply all by 2
// Mathematical functions
final sqrted = MathOps.sqrt(df1);
final logged = MathOps.log(df1);
final absolute = MathOps.abs(df1);
// Rolling statistics
final rollingMean = MathOps.rollingMean(df1, 3);
final cumSum = MathOps.cumSum(df1);
// Correlation matrix
final corrMatrix = MathOps.corr(df1);
Advanced Features #
Grouping and Aggregation #
final df = DataFrame({
'category': ['A', 'B', 'A', 'B', 'A'],
'value': [1, 2, 3, 4, 5],
});
final groups = df.groupBy(['category']);
for (final entry in groups.entries) {
print('Group ${entry.key}:');
print(entry.value);
}
Joining DataFrames #
final df1 = DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
});
final df2 = DataFrame({
'id': [1, 2, 4],
'age': [25, 30, 35],
});
final joined = df1.join(df2, on: 'id', how: 'inner');
Data Cleaning #
// Handle null values
final cleaned = df.dropna(); // Remove rows with nulls
final filled = df.fillna(0); // Fill nulls with 0
// Data validation
final info = df.info(); // Get DataFrame info
final summary = df.summary(); // Get summary statistics
Performance Considerations #
- DaData is optimized for medium-sized datasets (up to millions of rows)
- For very large datasets, consider processing in chunks
- Use appropriate data types to minimize memory usage
- Take advantage of lazy evaluation where possible
Contributing #
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our GitHub repository.
License #
This project is licensed under the MIT License - see the LICENSE file for details.
Examples #
π Comprehensive Examples Available
The example/ directory contains detailed, real-world examples demonstrating all aspects of the data_frame library:
π Getting Started #
- main.dart - Complete overview with all major features
- basic_operations.dart - DataFrame and Series fundamentals
π Advanced Analytics #
- statistical_analysis.dart - Statistical tests, correlation, regression
- mathematical_operations.dart - Math functions, rolling statistics, financial calculations
π§ Data Management #
- data_io_operations.dart - CSV, JSON, URL data loading with error handling
- advanced_data_manipulation.dart - Grouping, joining, cleaning, time series
π― Practical Applications #
- sample_data_generation.dart - Generate test data, simulations, performance testing
- real_world_analysis.dart - Complete business intelligence workflow
π Run the Examples #
# Run the comprehensive overview
dart run example/main.dart
# Run specific examples
dart run example/statistical_analysis.dart
dart run example/real_world_analysis.dart
# See all available examples
ls example/
Each example includes:
- β Complete, runnable code
- π Detailed explanations and comments
- π Sample data and realistic scenarios
- π‘ Best practices and performance tips
- π§ͺ Multiple use cases and patterns
Perfect for learning, reference, and adapting to your own projects!
Testing #
DaData comes with a comprehensive test suite covering all functionality:
# Run all tests
dart test
# Run with coverage
dart test --coverage=coverage
# Run specific test group
dart test -n "Series Tests"
Test Coverage: 87 tests covering 100% of public APIs
See TEST_COVERAGE.md for detailed coverage information.
Code Quality #
# Run static analysis
dart analyze
# Format code
dart format .
Status: β No analyzer issues, all tests passing
API Reference #
For complete API documentation, visit our documentation site or generate docs locally with dart doc.