Capablanca Blog (Page 2)

LLM Psychological Evaluation Benchmark

1. Dataset and Notation * Rows are individual “shots” of a question under a given model/language/configuration/temperature. * Columns: * model ∈ {anthropic/claude-3.7-sonnet, google/gemini-2.0-flash-001, google/gemma-3-27b-it, meta-llama/llama-3.3-70b-instruct, openai/gpt-4o-2024-11-20, x‑ai/grok-2-1212} * language ∈ {english, romanian} * configuration ∈ {closed_form, closed_form_with_explanation, open_form} * temperature ∈ {0.

Bringing Structure and Faithfulness to LLM Outputs

Exploring Grammar-Constrained Decoding (GCD) for LLMs. Comparing EGG, GAD, and FEG methods for improving efficiency, faithfulness, and flexibility in generating structured output.

Beyond Basic RFM: Introducing RFM-T-U-E(BDP) for Deeper SaaS Tenant Insights

Understanding your customers, or tenants in the SaaS world, is crucial. How engaged are they? Are they getting value? Are they at risk of churning? For decades, businesses have turned to the RFM (Recency, Frequency, Monetary) model for feature engineering, especially in retail and e-commerce, to answer similar questions. But

Investigating Grokking as a Phase Transition under the SETOL framework and Latent Dunning-Kruger Dynamics

(WIP) 1. Introduction Grokking is a delayed yet sudden leap in generalization performance, often observed in neural networks long after they have perfectly memorized the training set. Recent studies indicate two compelling perspectives: 1. Spectral Phase Transition: Grokking arises from heavy-tailed self-regularization (HTSR), where the network’s eigenvalue spectrum evolves

Converting 2D Architectural Drawings into Hierarchical Scene Graph Representations

In this ongoing research, we outline a theoretical framework for transforming architectural 2D drawings into hierarchical scene graph representations. Traditional floorplans carry far more than rectangular outlines: they contain rich spatial relationships, complex room shapes, and functional constraints. Our approach bridges graph-theoretic floorplan analysis, hierarchical scene modeling, and constraint reasoning

Exploring the "Latent Dunning-Kruger" Effect for Grokking Meta-Optimizers

In this ongoing research, we propose that the Dunning-Kruger effect, a statistically demonstrated phenomenon in human learning, has a striking analogy in deep learning. Specifically, when we look at models undergoing grokking (an abrupt shift from memorization to generalization), we observe hysteresis-like curves reminiscent of both: 1. Dunning-Kruger curves, where

SETOL-based LLM Mechanistic Interpretability of Domain Knowledge

In this ongoing research study, we collaborate with WeightWatcher to build a new LLM mechanistic interpretability framework grounded in the SETOL (Semi-Empirical Theory of Deep Learning) perspective. This exploration aims to combine insights from spectral analysis of weights with energy-based and physics-inspired approaches to understanding how large language models (LLMs)

From the Partition Function ("God Equation") to Linear Regression, GLMs, and SVMs

We will explore a conceptual roadmap showing how one can connect the “god equation” of statistical mechanics (the partition function) to various common machine learning models (linear regression, generalized linear models, and SVMs). We will also highlight how linear regression and SVM can both be viewed as special cases that

RAG Generation Pipeline for a DE-CBT Therapist Chatbot

In this case study, we demonstrate how a Retrieval-Augmented Generation (RAG) pipeline supports an automated therapy bot based on Darwinian Enhanced Cognitive Behavioral Therapy (DE-CBT) within the ABCDE framework (Activating Events, Beliefs, Consequences, Disputing, and New Emotions/Behaviors). The pipeline employs hybrid semantic embedding methods (BGE-M3), multi-level memory design, and

A Systematic Review of Parameter Optimizers

Modern machine learning, particularly in deep learning, relies heavily on efficient optimization strategies to train complex models. Optimization algorithms and the careful tuning of hyperparameters are crucial to achieving high accuracy and generalization. This article explores the core concepts of model parameters versus hyperparameters, various optimization approaches, and essential hyperparameter

Revolutionizing Predictive Maintenance - A Hybrid Deep Approach to Remaining Useful Life (RUL) Estimation

A hybrid deep learning approach for predictive maintenance, combining time-frequency feature extraction, advanced AI architectures, and innovative bearing analysis to optimize Remaining Useful Life (RUL) predictions, tackling data sparsity and complex degradation dynamics for industrial reliability.

Structured Topic and Sentiment Analysis Pipeline - Group Transitions Dynamics and Emotional Triggers

An advanced NLP pipeline for structured topic and sentiment analysis, leveraging large language models (LLMs), zero-shot labeling, and hierarchical clustering to decode emotional triggers and semantic patterns in online communities for scalable insights.

A computer simulation of the emergence of new traits in a population

A computer simulation exploring how evolutionary pressures lead to new traits in human-like agents within hunter-gatherer environments, analyzing factors that drive adaptations over thousands of generations.

Machine Learning

SDs and Effect Sizes in Linear Mixed Models (LMMs)?

An in-depth exploration of standard deviations and effect sizes in Linear Mixed Models (LMMs), clarifying distinctions between descriptive and inferential statistics, and addressing challenges in analyzing clustered data with hierarchical dependencies.