Word Rate Feature Tutorial
This tutorial shows how to train encoding models using word rate features with the LeBel assembly. Word rate features are simple but effective baselines that measure the rate of word presentation.
Overview
Word rate features capture the temporal dynamics of language presentation by measuring how many words are presented per time unit. This is one of the simplest but an effective feature for brain encoding models.
Key Components
Assembly: Pre-packaged LeBel assembly containing brain data and stimuli
Feature Extractor: WordRateFeatureExtractor for computing word presentation rates
Downsampler: Aligns word-level features with brain data timing
Model: Ridge regression with nested cross-validation
Trainer: AbstractTrainer orchestrates the entire pipeline
Step-by-Step Tutorial
Load the Assembly
from encoding.assembly.assembly_loader import load_assembly # Load the pre-packaged LeBel assembly assembly = load_assembly("assembly_lebel_uts03.pkl")
Create Word Rate Feature Extractor
from encoding.features.factory import FeatureExtractorFactory extractor = FeatureExtractorFactory.create_extractor( modality="wordrate", model_name="wordrate", config={}, cache_dir="cache", )
Set Up Downsampler and Model
from encoding.downsample.downsampling import Downsampler from encoding.models.nested_cv import NestedCVModel downsampler = Downsampler() model = NestedCVModel(model_name="ridge_regression")
Configure Training Parameters
# FIR delays for hemodynamic response modeling fir_delays = [1, 2, 3, 4] # Trimming configuration for LeBel dataset trimming_config = { "train_features_start": 10, "train_features_end": -5, "train_targets_start": 0, "train_targets_end": None, "test_features_start": 50, "test_features_end": -5, "test_targets_start": 40, "test_targets_end": None, } downsample_config = {}
Create and Run Trainer
from encoding.trainer import AbstractTrainer trainer = AbstractTrainer( assembly=assembly, feature_extractors=[extractor], downsampler=downsampler, model=model, fir_delays=fir_delays, trimming_config=trimming_config, use_train_test_split=True, logger_backend="wandb", wandb_project_name="lebel-wordrate", dataset_type="lebel", results_dir="results", downsample_config=downsample_config, ) metrics = trainer.train() print(f"Median correlation: {metrics.get('median_score', float('nan')):.4f}")
Understanding Word Rate Features
Word rate features are computed by:
Counting words per TR: The assembly pre-computes word rates for each TR
No additional processing needed: Word rates are already aligned with brain data
Simple but effective: Captures temporal dynamics of language presentation
The word rate extractor simply returns the pre-computed word rates from the assembly, making it the fastest feature type to compute.
Key Parameters
modality: “wordrate” - specifies the feature type
model_name: “wordrate” - identifier for the extractor
config: {} - no additional configuration needed
cache_dir: “cache” - directory for caching (though word rates don’t need caching)
Training Configuration
fir_delays: [1, 2, 3, 4] - temporal delays to account for hemodynamic response
trimming_config: LeBel-specific trimming to avoid boundary effects
Word rate features provide an excellent foundation for understanding the LITcoder pipeline before moving to more complex feature types.