Language Model Features Tutorial
This tutorial shows how to train encoding models using language model features with the LeBel assembly. Language model features capture rich semantic representations from transformer models.
Overview
Language model features extract high-dimensional representations from transformer models like GPT-2. These features capture semantic, syntactic, and contextual information that can be highly predictive of brain activity.
Key Components
Assembly: Pre-packaged LeBel assembly containing brain data and stimuli
Feature Extractor: LanguageModelFeatureExtractor using transformer models
Caching: Multi-layer activation caching for efficient training
Downsampler: Aligns word-level features with brain data timing
Model: Ridge regression with nested cross-validation
Trainer: AbstractTrainer orchestrates the entire pipeline
Step-by-Step Tutorial
Load the Assembly
from encoding.assembly.assembly_loader import load_assembly # Load the pre-packaged LeBel assembly assembly = load_assembly("assembly_lebel_uts03.pkl")
Create Language Model Feature Extractor
from encoding.features.factory import FeatureExtractorFactory extractor = FeatureExtractorFactory.create_extractor( modality="language_model", model_name="gpt2-small", # Can be changed to other models config={ "model_name": "gpt2-small", "layer_idx": 9, # Layer to extract features from "last_token": True, # Use last token only "lookback": 256, # Context lookback "context_type": "fullcontext", }, cache_dir="cache_language_model", )
Set Up Downsampler and Model
from encoding.downsample.downsampling import Downsampler from encoding.models.nested_cv import NestedCVModel downsampler = Downsampler() model = NestedCVModel(model_name="ridge_regression")
Configure Training Parameters
# FIR delays for hemodynamic response modeling fir_delays = [1, 2, 3, 4] # Trimming configuration for LeBel dataset trimming_config = { "train_features_start": 10, "train_features_end": -5, "train_targets_start": 0, "train_targets_end": None, "test_features_start": 50, "test_features_end": -5, "test_targets_start": 40, "test_targets_end": None, } downsample_config = {}
Create and Run Trainer
from encoding.trainer import AbstractTrainer trainer = AbstractTrainer( assembly=assembly, feature_extractors=[extractor], downsampler=downsampler, model=model, fir_delays=fir_delays, trimming_config=trimming_config, use_train_test_split=True, logger_backend="wandb", wandb_project_name="lebel-language-model", dataset_type="lebel", results_dir="results", layer_idx=9, # Pass layer_idx to trainer lookback=256, # Pass lookback to trainer ) metrics = trainer.train() print(f"Median correlation: {metrics.get('median_score', float('nan')):.4f}")
Understanding Language Model Features
Language model features are extracted by:
Text Processing: Each stimulus text is tokenized and processed
Transformer Forward Pass: The model processes the text through all layers
Feature Extraction: Features are extracted from the specified layer
Caching: Multi-layer activations are cached for efficiency
Downsampling: Features are aligned with brain data timing
Key Parameters
modality: “language_model” - specifies the feature type
model_name: “gpt2-small” - transformer model to use
layer_idx: 9 - which layer to extract features from
last_token: True - use only the last token’s features (we recommend using this)
lookback: 256 - context window size
context_type: “fullcontext” - how to handle context
cache_dir: “cache_language_model” - directory for caching
Model Options
Supported models include: - gpt2-small: Fast, good baseline - gpt2-medium: Better performance, slower - facebook/opt-125m: Alternative architecture - Other TransformerLens models: Any compatible model from TransformerLens model properties table
Caching System
The language model extractor uses a sophisticated caching system:
Multi-layer caching: All layers are cached together
Lazy loading: Layers are loaded on-demand
Efficient storage: Compressed storage of activations
Cache validation: Ensures cached data matches parameters
This makes it efficient to experiment with different layers without recomputing features.
Training Configuration
fir_delays: [1, 2, 3, 4] - temporal delays for hemodynamic response
trimming_config: LeBel-specific trimming to avoid boundary effects
layer_idx: 9 - which layer to use for training
lookback: 256 - context window size