Word Rate Feature Tutorial
=========================

This tutorial shows how to train encoding models using word rate features with the LeBel assembly. Word rate features are simple but effective baselines that measure the rate of word presentation.

Overview
--------

Word rate features capture the temporal dynamics of language presentation by measuring how many words are presented per time unit. This is one of the simplest but an effective feature for brain encoding models.

Key Components
--------------

- **Assembly**: Pre-packaged LeBel assembly containing brain data and stimuli
- **Feature Extractor**: WordRateFeatureExtractor for computing word presentation rates
- **Downsampler**: Aligns word-level features with brain data timing
- **Model**: Ridge regression with nested cross-validation
- **Trainer**: AbstractTrainer orchestrates the entire pipeline

Step-by-Step Tutorial
---------------------

1. **Load the Assembly**

   .. code-block:: python

      from encoding.assembly.assembly_loader import load_assembly
      
      # Load the pre-packaged LeBel assembly
      assembly = load_assembly("assembly_lebel_uts03.pkl")

2. **Create Word Rate Feature Extractor**

   .. code-block:: python

      from encoding.features.factory import FeatureExtractorFactory
      
      extractor = FeatureExtractorFactory.create_extractor(
          modality="wordrate",
          model_name="wordrate",
          config={},
          cache_dir="cache",
      )

3. **Set Up Downsampler and Model**

   .. code-block:: python

      from encoding.downsample.downsampling import Downsampler
      from encoding.models.nested_cv import NestedCVModel
      
      downsampler = Downsampler()
      model = NestedCVModel(model_name="ridge_regression")

4. **Configure Training Parameters**

   .. code-block:: python

      # FIR delays for hemodynamic response modeling
      fir_delays = [1, 2, 3, 4]
      
      # Trimming configuration for LeBel dataset
      trimming_config = {
          "train_features_start": 10,
          "train_features_end": -5,
          "train_targets_start": 0,
          "train_targets_end": None,
          "test_features_start": 50,
          "test_features_end": -5,
          "test_targets_start": 40,
          "test_targets_end": None,
      }
      
      downsample_config = {}

5. **Create and Run Trainer**

   .. code-block:: python

      from encoding.trainer import AbstractTrainer
      
      trainer = AbstractTrainer(
          assembly=assembly,
          feature_extractors=[extractor],
          downsampler=downsampler,
          model=model,
          fir_delays=fir_delays,
          trimming_config=trimming_config,
          use_train_test_split=True,
          logger_backend="wandb",
          wandb_project_name="lebel-wordrate",
          dataset_type="lebel",
          results_dir="results",
          downsample_config=downsample_config,
      )
      
      metrics = trainer.train()
      print(f"Median correlation: {metrics.get('median_score', float('nan')):.4f}")

Understanding Word Rate Features
--------------------------------

Word rate features are computed by:

1. **Counting words per TR**: The assembly pre-computes word rates for each TR
2. **No additional processing needed**: Word rates are already aligned with brain data
3. **Simple but effective**: Captures temporal dynamics of language presentation

The word rate extractor simply returns the pre-computed word rates from the assembly, making it the fastest feature type to compute.

Key Parameters
--------------

- **modality**: "wordrate" - specifies the feature type
- **model_name**: "wordrate" - identifier for the extractor
- **config**: {} - no additional configuration needed
- **cache_dir**: "cache" - directory for caching (though word rates don't need caching)

Training Configuration
----------------------

- **fir_delays**: [1, 2, 3, 4] - temporal delays to account for hemodynamic response
- **trimming_config**: LeBel-specific trimming to avoid boundary effects


Word rate features provide an excellent foundation for understanding the LITcoder pipeline before moving to more complex feature types.