The Skin (Spatial Poolers)

Introduction

Deterministic AST parsing handles known source code, and hardcoded network listeners handle known telemetry payloads. But a truly sovereign architecture cannot collapse when presented with an undocumented text protocol, an alien configuration format, or unstructured natural language. It must possess a generic sensory discovery layer that can feel out the boundaries of an unknown structure mathematically.

In the Karyon architecture, this untargeted sensory organ is conceptually modeled as the “Skin.” It is a raw, generic interface that converts unstructured environmental noise into structured topological graph nodes without relying on hardcoded pre-processing.

Theoretical Foundation: Spatial Pooling and Topologies

When human skin touches an unknown object, it does not instantly classify the object; it registers raw tactile inputs—temperature, pressure, texture—that the brain correlates into a physical boundary. Karyon replicates this using Hierarchical Temporal Memory (HTM) and its core mechanism, the Spatial Pooler (SP) [1].

The Neocortical Paradigm of Hierarchical Temporal Memory

The paradigm of artificial intelligence is currently experiencing a structural shift toward decentralized, edge-based execution. This requires an architecture capable of processing continuous, unbounded sensory data without reliance on computationally expensive backpropagation. The HTM Spatial Pooler achieves this by converting arbitrary, high-dimensional input streams into highly robust, noise-resistant Sparse Distributed Representations (SDRs) [1].

Instead of relying on rigid, deterministic regular expressions for parsing unstructured bytes, the Karyon sensory perimeter employs Unicode-based word-encoding mechanisms. These encodings preserve the spatial topology of the input stream, allowing the HTM algorithm to continuously learn the syntax and semantic relationships of unknown text sequences [4]. By tracking structural similarities embedded in these temporal sequences, the generic perception cell dynamically delineates entities within the chaotic noise.

Hebbian Mechanics and Spatial Binding

The primary learning mechanism governing the adaptation of the Spatial Pooler is competitive Hebbian learning, formulated on the biological axiom that “cells that fire together, wire together.” Within the mathematical formalization of the SP, this is executed linearly through local synaptic permanence value updates [2].

As the continuous data stream flows through the perception cell, active structural patterns trigger specific mini-columns. Synapses connected to co-active input bits are mathematically increased (Long-Term Potentiation), while synapses connected to inactive bits are depressed (Long-Term Depression) [1]. Over time, if “String A” frequently co-occurs in close structural proximity to “String B”, the spatial pooler organically wires a representation binding them together. To prevent neural dominance and ensure the sensory stream maintains high entropy, the system strictly applies a homeostatic boosting factor to dynamically regulate cell excitability [1].

Technical Implementation: The Sensory Perimeter

HTM spatial pooling provides an exceptionally fast structural filter, but extracting complex relational semantics from an unstructured data stream occasionally necessitates the zero-shot reasoning capabilities inherent to transformer models. To remain entirely sovereign and localized, Karyon must orchestrate these models within severe hardware limits.

The Digitized Retina and Quantized SLMs

The perception cell spins up Small Language Models (SLMs) in the 1-billion to 3-billion parameter range (e.g., Llama 3.2 1B/3B, Qwen) strictly running on the CPU via robust C++ frameworks like llama.cpp [5], [13]. This model acts purely as a transient sensory boundary and is never utilized for Karyon’s internal reasoning or executive logic.

To circumvent severe memory bandwidth constraints on the CPU, the implementation relies heavily on sub-4-bit quantization (specifically the GGUF Q4_0 or Q4_K_M formats) [6]. By converting floating-point weights to integers, the physical size of a 3-billion parameter model is reduced to under 2 gigabytes. This radically reduces the necessary payload crossing the memory bus with every generated token, multiplying the theoretical token generation speed while retaining 99% of its baseline reasoning accuracy [6]. The role of this heavily quantized SLM is purely translational: prompt constraints force the SLM to parse the unstructured text output from the HTM layer and output highly structured relational tuples (e.g., [Entity_A] -> <Relationship> -> [Entity_B]) [7].

Topological Forging in the Rhizome

Once the SLM sensory filter translates environmental noise into a structured relationship, the Elixir Actor process pushes these tuples into Karyon’s core graph database, the Rhizome. Translating transient text into a dynamic graph topology effectively shifts the burden of multi-hop reasoning from the compute-bound SLM to the memory-bound database algorithms [8].

This topology is structured dynamically using algorithmic implementations of Hebbian learning. As the perception cell continuously observes and extracts identical relationships, the database incrementally increases the mathematical weight of the corresponding edge (Long-Term Potentiation) [9]. Conversely, edges that are infrequently observed naturally decay, mirroring biological synaptic depression [10]. Over time, unsupervised pruning protocols eliminate low-utility networks [11], optimizing the database into a “rich club” network architecture containing exclusively densely connected, highly relevant nodes [12].

The Engineering Reality: Hardware and Bottlenecks

The implementation of continuous, generic spatial poolers exposes the brutal reality of localized, bare-metal computing. While traversing the Rhizome graph is memory-bandwidth-bound, evaluating the unstructured input stream via SLMs is severely compute-bound.

The Memory Bandwidth Wall and Thermal Diagnostics

Processing sensory streams through a transformer architecture consists of two phases: the prefill (processing the input sequence) and the decode (generating the token). The prefill is deeply compute-bound, saturating the Arithmetic Logic Units (ALUs) and vector extensions across all CPU cores [13]. Generating tokens autonomously, however, requires shuttling the entire model’s parameters across the DDR5 memory bus for each execution step, instantly hitting the absolute physical ceiling of RAM data transfer rates [6]. Multicore CPU systems are often preferred for this task over external GPUs explicitly due to the severe PCIe memory transfer overheads at the edge [3].

Unlike user-facing chatbots that run in discrete bursts, a sensory perimeter must evaluate ambient streams perpetually. Maintaining continuous 100% CPU utilization rapidly exhausts the silicon’s Thermal Design Power (TDP) [15]. Once maximum thermal capacity is reached, the operating system aggressively throttles clock frequencies, resulting in catastrophic latency spikes, erratic Inter-Token Latency (ITL), and complete collapse of the input stream [14], [16]. Furthermore, continuous SLM inference severely pollutes the L2 and L3 caches, displacing Karyon’s core Elixir processes and causing operating system context switching overhead [17].

Core-Pinning and Metabolic Capping

These physical constraints necessitate draconian systems engineering. If generic perception cells ingest data faster than the hardware can calculate relational overlaps, the entire active inference loop halts.

To ensure continuous sensory evaluation without cannibalizing executive reasoning resources, Karyon implements strict CPU core-pinning, or CPU affinity. By explicitly locking the llama.cpp inference threads to a segregated subset of processing cores (often efficient E-cores), Karyon completely bypasses the Linux Completely Fair Scheduler (CFS) [18]. Core-pinning guarantees that the SLM’s quantized weights remain localized within specific L2 caches (“cache warmth”), eliminating the microsecond latency penalties of thread migration and TLB flushing [18].

This is paired with aggressive “metabolic capping”, placing artificial limits on execution speed to prevent thermal overload. By restricting the thread count below the physical core maximum and actively power gating idle CPU sectors [19], Karyon trades peak theoretical inference speed for a reliable, completely flat latency curve—an absolute necessity for surviving infinite data streams.

Instability Risks and The Academic Counter-Argument

Operating with continuous unsupervised Hebbian updates exposes the architecture to mathematically documented risks. Academic critics consistently note that a pure Hebbian update rule lacks homeostasis. Without complex non-linear normalizations, continuous co-activation can result in “runaway excitation”, where a hyper-connected cluster of nodes completely destroys the sparsity required for efficient graph querying [20]. In addition, while mathematically analogous structures exist between Hebbian rules and stochastic gradient descent via Dale’s backpropagation [21], [22], an unconstrained associative system can still be hijacked by hallucinated data.

To counter this “reasoning drift”, modern autonomous designs deploy “validation-gated” Hebbian mechanisms. Edge strengthening is halted unless the extracted structural relationship can be explicitly validated against Karyon’s known reality [23], guaranteeing that hallucinated noise from the SLM sensory layer does not permanently rewrite the sovereign memory core.

Summary

When encountering unmapped, chaotic environments where deterministic parsing fails, Karyon deploys dynamic “Skin” cells. Utilizing ultra-quantized Small Language Models (llama.cpp) strictly pinned to specific CPU cores, these cells employ continuous Hebbian learning rules to organically detect and bind structural relationships from unstructured noise, building valid topological edges without suffocating Karyon’s core execution loops.

References

Cui, Y., Ahmad, S., Hawkins, J. (2017). The HTM Spatial Pooler—A Neocortical Algorithm for Online Sparse Distributed Coding. Frontiers in Computational Neuroscience. https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2017.00111/full
Mnatzaganian, J., et al. (2016). A Mathematical Formalization of Hierarchical Temporal Memory’s Spatial Pooler. arXiv. https://arxiv.org/abs/1601.06116
Zhang, H., Huang, J. (2025). Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference. arXiv. https://arxiv.org/html/2505.06461v1
(2024). Extracting Geoscientific Dataset Names from the Literature Based on the Hierarchical Temporal Memory Model. MDPI. https://www.mdpi.com/2220-9964/13/7/260
(2024). Accelerating Llama.cpp Performance in Consumer LLM Applications with AMD Ryzen™ AI 300 Series. AMD. https://www.amd.com/en/blogs/2024/accelerating-llama-cpp-performance-in-consumer-llm.html
(2025). Sometimes Painful but Promising: Feasibility and Trade-offs of On-Device Language Model Inference. arXiv. https://arxiv.org/html/2503.09114v2
(2025). Complex System Diagnostics Using a Knowledge Graph-Informed and Large Language Model-Enhanced Framework. MDPI. https://www.mdpi.com/2076-3417/15/17/9428
Fisher, M. (2025). Neural Graph Memory: A Structured Approach to Long-Term Memory in Multimodal Agents. ResearchGate. https://www.researchgate.net/publication/394440420_Neural_Graph_Memory_A_Structured_Approach_to_Long-Term_Memory_in_Multimodal_Agents
(2023). arXiv:2307.02738v3 [cs.AI]. arXiv. https://arxiv.org/pdf/2307.02738
Chechik, G., Meilijson, I., Ruppin, E. (1998). Synaptic pruning in development: a computational account. Neural Computation. https://pubmed.ncbi.nlm.nih.gov/9744896/
(2015). Decreasing-Rate Pruning Optimizes the Construction of Efficient and Robust Distributed Networks. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC4517947/
(2014). Generative models of rich clubs in Hebbian neuronal networks and large-scale human brain networks. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC4150306/
(2025). Demystifying Small Language Models for Edge Deployment. ACL Anthology. https://aclanthology.org/2025.acl-long.718.pdf
(2025). vLLM or llama.cpp: Choosing the right LLM inference engine for your use case. Red Hat. https://developers.redhat.com/articles/2025/09/30/vllm-or-llamacpp-choosing-right-llm-inference-engine-your-use-case
(2025). Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment. arXiv. https://arxiv.org/pdf/2501.03265
(2024). Performance of llama.cpp on Snapdragon X Elite/Plus #8273. GitHub. https://github.com/ggml-org/llama.cpp/discussions/8273
(2025). OS-Level Challenges in LLM Inference and Optimizations. eunomia. https://eunomia.dev/blog/2025/02/18/os-level-challenges-in-llm-inference-and-optimizations/
Arya, K. (2024). Optimizing Event Loops with CPU Pinning: Benefits and Tradeoffs. Medium. https://medium.com/@kuldeeparyadotcom/optimizing-event-loops-with-cpu-pinning-benefits-and-tradeoffs-59e7ac80b2cc
Borovica-Gajic, R. (2022). Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards. https://renata.borovica-gajic.com/data/2022_csur.pdf
(n.d.). An introduction to Neural Networks. UVa. https://www.infor.uva.es/~teodoro/neuro-intro.pdf
(2025). Spike-timing-dependent Hebbian learning as noisy gradient descent. bioRxiv. https://www.biorxiv.org/content/10.1101/2025.01.09.632231v1.full-text
(2025). Emergence of Hebbian Dynamics in Regularized Non-Local Learners. arXiv. https://arxiv.org/html/2505.18069v1
(2024). Validation-Gated Hebbian Learning for Adaptive Agent Memory. OpenReview. https://openreview.net/pdf?id=EN9VRTnZbK