Word Rate Feature Tutorial

This tutorial shows how to train encoding models using word rate features with the LeBel assembly. Word rate features are simple but effective baselines that measure the rate of word presentation.

Overview

Word rate features capture the temporal dynamics of language presentation by measuring how many words are presented per time unit. This is one of the simplest but an effective feature for brain encoding models.

Key Components

  • Assembly: Pre-packaged LeBel assembly containing brain data and stimuli

  • Feature Extractor: WordRateFeatureExtractor for computing word presentation rates

  • Downsampler: Aligns word-level features with brain data timing

  • Model: Ridge regression with nested cross-validation

  • Trainer: AbstractTrainer orchestrates the entire pipeline

Step-by-Step Tutorial

  1. Load the Assembly

    from encoding.assembly.assembly_loader import load_assembly
    
    # Load the pre-packaged LeBel assembly
    assembly = load_assembly("assembly_lebel_uts03.pkl")
    
  2. Create Word Rate Feature Extractor

    from encoding.features.factory import FeatureExtractorFactory
    
    extractor = FeatureExtractorFactory.create_extractor(
        modality="wordrate",
        model_name="wordrate",
        config={},
        cache_dir="cache",
    )
    
  3. Set Up Downsampler and Model

    from encoding.downsample.downsampling import Downsampler
    from encoding.models.nested_cv import NestedCVModel
    
    downsampler = Downsampler()
    model = NestedCVModel(model_name="ridge_regression")
    
  4. Configure Training Parameters

    # FIR delays for hemodynamic response modeling
    fir_delays = [1, 2, 3, 4]
    
    # Trimming configuration for LeBel dataset
    trimming_config = {
        "train_features_start": 10,
        "train_features_end": -5,
        "train_targets_start": 0,
        "train_targets_end": None,
        "test_features_start": 50,
        "test_features_end": -5,
        "test_targets_start": 40,
        "test_targets_end": None,
    }
    
    downsample_config = {}
    
  5. Create and Run Trainer

    from encoding.trainer import AbstractTrainer
    
    trainer = AbstractTrainer(
        assembly=assembly,
        feature_extractors=[extractor],
        downsampler=downsampler,
        model=model,
        fir_delays=fir_delays,
        trimming_config=trimming_config,
        use_train_test_split=True,
        logger_backend="wandb",
        wandb_project_name="lebel-wordrate",
        dataset_type="lebel",
        results_dir="results",
        downsample_config=downsample_config,
    )
    
    metrics = trainer.train()
    print(f"Median correlation: {metrics.get('median_score', float('nan')):.4f}")
    

Understanding Word Rate Features

Word rate features are computed by:

  1. Counting words per TR: The assembly pre-computes word rates for each TR

  2. No additional processing needed: Word rates are already aligned with brain data

  3. Simple but effective: Captures temporal dynamics of language presentation

The word rate extractor simply returns the pre-computed word rates from the assembly, making it the fastest feature type to compute.

Key Parameters

  • modality: “wordrate” - specifies the feature type

  • model_name: “wordrate” - identifier for the extractor

  • config: {} - no additional configuration needed

  • cache_dir: “cache” - directory for caching (though word rates don’t need caching)

Training Configuration

  • fir_delays: [1, 2, 3, 4] - temporal delays to account for hemodynamic response

  • trimming_config: LeBel-specific trimming to avoid boundary effects

Word rate features provide an excellent foundation for understanding the LITcoder pipeline before moving to more complex feature types.