Language Model Features Tutorial

This tutorial shows how to train encoding models using language model features with the LeBel assembly. Language model features capture rich semantic representations from transformer models.

Overview

Language model features extract high-dimensional representations from transformer models like GPT-2. These features capture semantic, syntactic, and contextual information that can be highly predictive of brain activity.

Key Components

  • Assembly: Pre-packaged LeBel assembly containing brain data and stimuli

  • Feature Extractor: LanguageModelFeatureExtractor using transformer models

  • Caching: Multi-layer activation caching for efficient training

  • Downsampler: Aligns word-level features with brain data timing

  • Model: Ridge regression with nested cross-validation

  • Trainer: AbstractTrainer orchestrates the entire pipeline

Step-by-Step Tutorial

  1. Load the Assembly

    from encoding.assembly.assembly_loader import load_assembly
    
    # Load the pre-packaged LeBel assembly
    assembly = load_assembly("assembly_lebel_uts03.pkl")
    
  2. Create Language Model Feature Extractor

    from encoding.features.factory import FeatureExtractorFactory
    
    extractor = FeatureExtractorFactory.create_extractor(
        modality="language_model",
        model_name="gpt2-small",  # Can be changed to other models
        config={
            "model_name": "gpt2-small",
            "layer_idx": 9,  # Layer to extract features from
            "last_token": True,  # Use last token only
            "lookback": 256,  # Context lookback
            "context_type": "fullcontext",
        },
        cache_dir="cache_language_model",
    )
    
  3. Set Up Downsampler and Model

    from encoding.downsample.downsampling import Downsampler
    from encoding.models.nested_cv import NestedCVModel
    
    downsampler = Downsampler()
    model = NestedCVModel(model_name="ridge_regression")
    
  4. Configure Training Parameters

    # FIR delays for hemodynamic response modeling
    fir_delays = [1, 2, 3, 4]
    
    # Trimming configuration for LeBel dataset
    trimming_config = {
        "train_features_start": 10,
        "train_features_end": -5,
        "train_targets_start": 0,
        "train_targets_end": None,
        "test_features_start": 50,
        "test_features_end": -5,
        "test_targets_start": 40,
        "test_targets_end": None,
    }
    
    downsample_config = {}
    
  5. Create and Run Trainer

    from encoding.trainer import AbstractTrainer
    
    trainer = AbstractTrainer(
        assembly=assembly,
        feature_extractors=[extractor],
        downsampler=downsampler,
        model=model,
        fir_delays=fir_delays,
        trimming_config=trimming_config,
        use_train_test_split=True,
        logger_backend="wandb",
        wandb_project_name="lebel-language-model",
        dataset_type="lebel",
        results_dir="results",
        layer_idx=9,  # Pass layer_idx to trainer
        lookback=256,  # Pass lookback to trainer
    )
    
    metrics = trainer.train()
    print(f"Median correlation: {metrics.get('median_score', float('nan')):.4f}")
    

Understanding Language Model Features

Language model features are extracted by:

  1. Text Processing: Each stimulus text is tokenized and processed

  2. Transformer Forward Pass: The model processes the text through all layers

  3. Feature Extraction: Features are extracted from the specified layer

  4. Caching: Multi-layer activations are cached for efficiency

  5. Downsampling: Features are aligned with brain data timing

Key Parameters

  • modality: “language_model” - specifies the feature type

  • model_name: “gpt2-small” - transformer model to use

  • layer_idx: 9 - which layer to extract features from

  • last_token: True - use only the last token’s features (we recommend using this)

  • lookback: 256 - context window size

  • context_type: “fullcontext” - how to handle context

  • cache_dir: “cache_language_model” - directory for caching

Model Options

Supported models include: - gpt2-small: Fast, good baseline - gpt2-medium: Better performance, slower - facebook/opt-125m: Alternative architecture - Other TransformerLens models: Any compatible model from TransformerLens model properties table

Caching System

The language model extractor uses a sophisticated caching system:

  1. Multi-layer caching: All layers are cached together

  2. Lazy loading: Layers are loaded on-demand

  3. Efficient storage: Compressed storage of activations

  4. Cache validation: Ensures cached data matches parameters

This makes it efficient to experiment with different layers without recomputing features.

Training Configuration

  • fir_delays: [1, 2, 3, 4] - temporal delays for hemodynamic response

  • trimming_config: LeBel-specific trimming to avoid boundary effects

  • layer_idx: 9 - which layer to use for training

  • lookback: 256 - context window size