Technical Documentation

Code Architecture

The earnings call tone research framework is built with modularity, testability, and reproducibility in mind. This page provides technical details for researchers and practitioners who want to understand or extend the implementation.

Project Structure

earnings_call_tone_research/
├── data/                          # Raw data files
│   ├── ff5_daily.parquet         # Fama-French 5 factors
│   ├── stock_prices.parquet      # Daily stock prices
│   └── tone_dispersion.parquet   # Earnings call data
├── src/                          # Core source code
│   ├── __init__.py
│   ├── factor_build.py           # Factor construction
│   ├── load.py                   # Data loading utilities
│   ├── neutralise.py             # Risk factor neutralization
│   ├── portfolio.py              # Portfolio construction
│   └── report.py                 # Performance analysis
├── tests/                        # Comprehensive test suite
│   ├── test_integration.py       # End-to-end testing
│   ├── test_pipeline.py          # Unit tests
│   └── test_backtest_integration.py # Backtest validation
├── outputs/                      # Generated results
├── docs/                         # GitHub Pages documentation
├── run_backtest.py              # Main execution script
└── requirements.txt             # Python dependencies

Core Modules

1. Data Loading (src/load.py)

Purpose: Centralized data access with validation and preprocessing

def prices() -> pd.DataFrame:
    """Load and pivot stock prices to wide format"""
    
def ff_factors() -> pd.DataFrame:
    """Load Fama-French factor returns"""
    
def tone_calls() -> pd.DataFrame:
    """Load earnings call tone dispersion data"""

Features:

2. Factor Construction (src/factor_build.py)

Purpose: Transform raw tone data into tradeable factor signals

def build_daily_factor() -> pd.Series:
    """
    Convert quarterly earnings calls to daily factor signals
    
    Steps:
    1. Parse call timestamps and map to trading dates
    2. Aggregate multiple calls per symbol-date
    3. Apply economic sign convention (negate dispersion)
    4. Cross-sectional normalization (z-score)
    
    Returns:
    --------
    pd.Series: Factor values indexed by (date, symbol)
    """

Key Implementation Details:

3. Risk Factor Neutralization (src/neutralise.py)

Purpose: Remove systematic risk exposures from factor signals

def neutralise(factor: pd.Series) -> pd.Series:
    """
    Neutralize factor against Fama-French factors
    
    Regression: factor_t = α + β₁*MktRF + β₂*SMB + β₃*HML + β₄*RMW + β₅*CMA + ε_t
    
    Returns: ε_t (residual factor)
    """

Statistical Approach:

4. Portfolio Construction (src/portfolio.py)

Purpose: Convert factor signals into implementable portfolio weights

Core Functions

def build_weights(signal: pd.Series, gross: float = 1.0, 
                 smoothing: float = 0.75) -> pd.DataFrame:
    """
    Advanced portfolio construction with turnover control
    
    Features:
    - Nonlinear signal transformation
    - Adaptive smoothing for turnover reduction
    - Market neutrality enforcement
    - Gross exposure control
    """

def calculate_turnover(weights: pd.DataFrame) -> pd.Series:
    """Calculate daily portfolio turnover"""

def pnl(weights: pd.DataFrame, horizon: int = 5) -> pd.Series:
    """Calculate portfolio P&L from weights and forward returns"""

Advanced Features

Nonlinear Signal Enhancement:

# Enhance signal distinction in tails
enhanced_signal = np.sign(centered_ranks) * np.abs(centered_ranks) ** 0.75

Adaptive Smoothing Algorithm:

# Reduce smoothing for significant signal changes
signal_change = abs(target_weights - previous_weights)
significant_threshold = signal_change.quantile(0.75)
adaptive_smoothing = smoothing - 0.25 if significant_change else smoothing

Constraint Enforcement:

5. Performance Analysis (src/report.py)

Purpose: Comprehensive factor and portfolio analysis

Alphalens Integration

def make_tearsheet(factor: pd.Series, out: str = "outputs/tearsheet.png"):
    """
    Generate Alphalens tearsheet with custom forward return calculation
    
    Custom Features:
    - Patched forward return computation for irregular frequencies
    - Silent execution for automated workflows
    - Comprehensive factor analysis
    """

Enhanced Metrics

def calculate_metrics(returns: pd.Series) -> Dict[str, float]:
    """
    Calculate advanced performance metrics:
    - Sharpe, Sortino, Calmar ratios
    - Maximum drawdown analysis
    - Win rate and profit ratio
    - Monthly consistency measures
    """

def analyze_factor_exposures(returns: pd.Series, 
                           factor_returns: pd.DataFrame) -> Tuple[pd.DataFrame, pd.Series]:
    """Rolling factor exposure analysis using sklearn regression"""

def calculate_conditional_metrics(returns: pd.Series, 
                                condition_series: pd.Series) -> Dict[str, Dict[str, float]]:
    """Performance analysis conditional on market regimes"""

Testing Framework

1. Integration Tests (tests/test_integration.py)

Comprehensive End-to-End Validation:

class TestParquetDataIntegrity:
    """Validate data file structure and quality"""
    
class TestDataAlignment:
    """Test cross-file date and symbol consistency"""
    
class TestEnhancedPipeline:
    """Validate enhanced features like smoothing and metrics"""
    
class TestDataLoading:
    """Test robust data loading and transformation"""

Coverage Areas:

2. Backtest Integration Tests (tests/test_backtest_integration.py)

Full Pipeline Validation:

class TestBacktestPipeline:
    """Test complete backtest execution"""
    
class TestPerformanceRegression:
    """Ensure performance characteristics remain stable"""
    
class TestDataConsistency:
    """Validate data relationships over time"""

Regression Testing:

3. Performance Testing

Benchmarks and Timing:

Data Pipeline

Input Data Requirements

Stock Prices (stock_prices.parquet)

# Required columns
['date', 'symbol', 'adjClose', 'open', 'high', 'low', 'close', 'volume']

# Expected format
- Long format: One row per date-symbol
- Date range: 2000-2024 (6,290 days)
- Symbols: 677 unique tickers
- Missing data: Handled via forward-fill and dropna

Earnings Calls (tone_dispersion.parquet)

# Required columns  
['symbol', 'date', 'tone_dispersion', 'year', 'quarter', 'company_id', 'call_key']

# Expected format
- Quarterly frequency: One row per earnings call
- Date range: 2005-2025 (33,362 calls)  
- Tone dispersion: Float values [0, 1] range
- Multiple calls per symbol-quarter: Aggregated by mean

Fama-French Factors (ff5_daily.parquet)

# Required columns
['mktrf', 'smb', 'hml', 'rmw', 'cma', 'umd', 'rf']

# Expected format
- Daily frequency with DatetimeIndex
- Date range: 1963-2024 (business days only)
- Values: Daily returns in decimal format
- Used for: Factor neutralization and performance attribution

Output Generation

Factor Panel (outputs/factor_panel.parquet)

Portfolio Weights (outputs/weights.parquet)

Visualizations

Deployment and Execution

Environment Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Data requirements (with Git LFS)
git lfs install
git lfs pull

Execution Workflow

# Run full backtest
python run_backtest.py

# Run specific tests
pytest tests/test_integration.py -v
pytest tests/test_backtest_integration.py::TestBacktestPipeline -v

# Generate documentation
cd docs/
jekyll serve  # For local development

Configuration Options

Smoothing Parameters:

SMOOTHING = 0.75  # 75% weight retention
# Range: [0.0, 1.0]
# 0.0 = No smoothing (100% turnover)
# 1.0 = Maximum smoothing (no turnover)

Portfolio Parameters:

GROSS_EXPOSURE = 1.0  # 100% gross exposure
HORIZON = 5          # 5-day forward returns

Performance Optimizations

Computational Efficiency

  1. Vectorized Operations: Pandas/NumPy for all calculations
  2. Memory Management: Chunked processing for large datasets
  3. Caching: Intermediate results stored as parquet files
  4. Parallel Processing: Multi-core support for rolling calculations

Scalability Considerations

Error Handling and Logging

Data Validation

Robust Execution


Development Guidelines

Code Standards

Contributing

  1. Fork the repository
  2. Create feature branch
  3. Add comprehensive tests
  4. Update documentation
  5. Submit pull request with clear description

Research Extensions


For questions about the technical implementation, please refer to the code documentation or submit an issue in the GitHub repository.