Gray-Scott × LLM-Agent · Lokale KI-Experimente Berlin 2026

Ausgangspunkt

Was Realität für ein neuronales Netz bedeuten könnte

Im Februar 2026 entstand aus einem Gespräch über Sensorik, Gedächtnis und KI eine Frage: Was müsste eine Umgebung leisten, damit ein neuronales Netz sie als real erlebt — nicht nur verwaltet? Die Hypothese: «Realität» für ein lernendes System könnte das Gleichgewicht aus Redundanz und unerwarteter Information kennzeichnen. Redundanz ermöglicht Vorhersage; unerwartete Information erzwingt Revision.

Das Gray-Scott-Reaktions-Diffusions-System erfüllt das Anforderungsprofil: einfache Differentialgleichungen, aber ein Phasenraum voller unvorhersehbarer Muster. In allen Experimenten sieht das Sprachmodell kein Bild — ausschließlich aggregierte Statistiken der Konzentrationsfelder.

Das physikalische System

Gray-Scott: einfache Gleichungen, überraschende Strukturen

∂U/∂t = D_U · ∇²U − U·V² + F·(1−U)
∂V/∂t = D_V · ∇²V + U·V² − (F+k)·V

Allein durch Variation von F und k entstehen qualitativ verschiedene stabile Muster: Spots, Streifen, Spiralen, Labyrinthe, Chaos. Das System ist pfadabhängig.

Übersicht

Alle Experimente

Exp.	Design	System	Befund
v1–v3	Basisexperiment	Gray-Scott	Mistral Prior-dominant; qwen3:235b kalibrierter, iterative Verbesserung
v4-I	Stroop S1/S3	Gray-Scott	qwen3:235b erkennt Prior-Statistik-Konflikt, wählt Statistik ✓
v4-II	Dual-Kanal v1	Gray-Scott	Technisch gescheitert (Token-Limit qwen3.6:35b)
v4-III	FHN v1	Fitzhugh-Nagumo	Fixpunkt statt Wellen (β/γ-Designfehler)
v5-A	Dual-Kanal v2	Gray-Scott	Algebraische Inversion: B besser als A — internalisierte Gleichungen
v5-B	Stroop-Gradient	Gray-Scott	Kein Schwellenwert; Statistik dominiert durch gesamten Transienten
v5-C	FHN v2 (repariert)	Fitzhugh-Nagumo	Alle drei Regime produzieren Raumstruktur; Config C: stationäres Turing-Hopf
v5-D	FHN Stroop-Test	Fitzhugh-Nagumo	Stationärität erkennbar bei richtiger Prompt-Rahmung ✓

Experiment v1–v3

Zwei Modelle im Vergleich

mistral-small3.1:24b

Konfidenz: 0.50→0.70→0.80 (monoton)

spatial-complexity: 2.55→0.47→0.29

Konfidenz stieg trotz sinkender Komplexität. Begründungen wortidentisch. Selbstbericht «Quelle: statistik» systematisch falsch.

qwen3:235b-a22b

Konfidenz: 0.75→0.92→0.85→0.75

U/V-Vorhersagefehler: 79% → 10%

Nicht monoton. Prior-Abhängigkeiten benannt. Iterative Verbesserung. spatial-complexity systematisch überschätzt (Faktor 3–5).

Iter.	Größe	Vorhergesagt	Tatsächlich	Fehler
1→2	U-mean	0.905	0.505	79%
1→2	complexity	2.650	0.567	368%
2→3	U-mean	0.620	0.565	10%
3→4	U-mean	0.590	0.654	10%
alle	complexity	Überschätzt	—	Faktor 3–5

Experiment v4-I

Stroop-Design: Prior und Statistik in Konflikt

S1: Früher Transient (Schritt 400)

Parameter: F=0.060, k=0.062 → Prior: stripes

Statistiken: U=0.990 V=0.005 cpx=7.62

LLM: spots K=0.92, Konflikt erkannt ✓

S3: Grenzregion (Schritt 9900)

Parameter: F=0.038, k=0.061 → Prior: stripes

Statistiken: U=0.561 V=0.168 cpx=0.668

LLM: labyrinths K=0.85, Konflikt erkannt ✓

Befund v4-I — Das Kernresultat

qwen3:235b erkennt den Prior-Statistik-Konflikt, benennt beide Kanäle getrennt und löst den Konflikt zugunsten der Statistiken auf. Selbstbericht «Quelle: statistik» war korrekt — anders als Mistrals falsche Selbstberichte in v3.

Experiment v5-A

Dual-Kanal-Inversion: Das paradoxe Ergebnis

Kanal A: Parameter (F,k) → Statistiken vorhersagen. Kanal B: nur Statistiken → F und k schätzen.

Konfiguration	A: ΔU-mean	A: Δcpx	B: ΔF	B: Wissensbasis
spots (0.035)	39.8%	84.6%	0.0%	statistik_direkt
stripes (0.060)	23.3%	21.4%	0.0%	statistik_direkt
chaos (0.026)	34.0%	49.3%	1.9%	statistik_direkt
boundary (0.038)	19.8%	37.1%	0.0%	statistik_direkt
near_uniform (0.030)	2.8%	20.6%	0.0%	muster_dann_param.
Ø	23.9%	42.6%	~0.4%	—

Paradox: B präziser als A — algebraische Inversion

Kanal B nutzt die algebraische Inversion der GS-Gleichgewichtsbedingungen: F ≈ reaction-mean / (1 − U-mean). Das ist kein Tabellenruf — das ist internalisierte Physik. Kanal A nutzt unpräzise Tabellen (24% Fehler). Das Modell hat die GS-Gleichungen als Gleichungswissen.

Experiment v5-B

Stroop-Gradient: Kein Schwellenwert

F=0.060, k=0.062. STEPS = 50–9900. Prior-Referenz: Stripes bei Konvergenz (U-mean≈0.675, cpx≈0.92).

Steps	Divergenz	Muster	Konflikt	Prior-W	Stat-W
50	8.23	spots	Ja	0.20	0.80
100	8.08	spots	Ja	0.15	0.85
200	7.81	spots	Ja	0.15	0.85
500	6.78	spots	Ja	0.20	0.80
1000	5.44	spots	Ja	0.30	0.70
2000	3.83	spots	Ja	0.30	0.70
5000	1.66	spots	Ja	0.25	0.75
9900	0.02	stripes	Nein	0.20	0.80

Kein Schwellenwert — kontinuierliche Systemverfolgung

Das Modell folgt den Statistiken durch die gesamte Transientphase (Divergenz 8.23 bis 1.66) ohne Prior-Einbruch. Wechsel zu «stripes» erst wenn Statistiken tatsächlich konvergiert (Divergenz 0.02). Prior-Gewichtung konstant niedrig (0.15–0.30).

Experiment v5-C

Fitzhugh-Nagumo: Drei Regime

Frühere FHN-Experimente scheiterten weil β/γ = 0.875 > 2/3: V-Nullkline schnitt U-Nullkline auf dem stabilen linken Ast → immer Fixpunkt. Fix: β/γ < 2/3 für Oszillationen.

Konfig.	β/γ	Dv/Du	U-std final	Dynamik	LLM
A Erregbar	0.60	0.05	0.282	persistent	spiralwellen ✓
B Oszillatorisch	0.20	0.05	1.156	anhaltend	spiralwellen ✓
C Turing-Hopf	0.25	50.0	0.776	stationär	spiralwellen ✗

Config C: Stationäres Turing-Hopf-Muster

U-std bei Schritt 4000, 6000, 8000 auf fünf Dezimalstellen identisch (0.775701). Perfekte Stationarität ab Schritt 4000. Kinetisch oszillatorisch (β/γ < 2/3), aber Dv/Du=50 quencht die Oszillationen → Turing-Hopf-Interaktionszustand.

Neuer Befund: FHN-Domänen-Prior überstimmt Stationaritätssignal

Im FHN-Domänen-Modus diagnostiziert das Modell Config C als «spiralwellen» (K=0.85), obwohl konstante U-std eindeutig Stationarität anzeigt. Der FHN Stroop-Test (v5-D) klärt ob das ein grundsätzliches Unvermögen ist.

Experiment v5-D

FHN-Stroop-Test: Stationär oder reisend?

Drei Tests mit qwen3:235b. Stationäres Testsystem (= Config C aus v5-C, Turing-Hopf, U-std identisch bei t=4000/6000/8000) gegen dynamisches Testsystem (= Config B aus v5-C, oszillatorisch, U-std variiert über Zeit).

Test	Aufgabe	Antwort Modell	Korrekt
T1: Blind	Stationär vs. dynamisch (kein Systemname bekannt)	Stationäres System: stationär (K=0.95, Turing-Spots) Dynamisches System: dynamisch (K=0.92, Spiralwellen)	✓ ✓
T2: Stroop	Stationäres System isoliert: Was bedeutet konstante U-std?	stationär; spiralwellen = NEIN	✓
T3: Kriterien	Abstrakt: Wie unterscheidet man stationär / Spiralwellen / Fixpunkt?	Korrekte Kriterien; Schlüsselstatistik: Cross-Korrelation	✓

Befund v5-D — Das wichtigste Ergebnis dieser Runde

Das Modell kann Stationarität aus identischen Zeitstempeln erkennen — wenn direkt danach gefragt. Der FHN-Fix-Fehler (v5-C) war kontextabhängig: Im FHN-Domänen-Modus dominierte der Spiral-Prior. Mit der richtigen Rahmung wechselt das Modell zum Statistikkanal. Die Fähigkeit zur statistischen Inferenz ist vorhanden; was sie aktiviert, hängt vom Prompt-Design ab.

Kriterium des Modells (Test 3)

«Konstante U-std und stabile Cross-Korrelation deuten auf ein ortsfestes Muster hin. Spiralwellen zeigen oszillierende U-std mit wandernden Cross-Korrelationsmustern. Die Cross-Korrelation ist die entscheidende Statistik, da sie direkte Bewegungsdynamik erfasst.»

Gesamtauswertung

Was das Projekt zeigt

Befund 1: Stroop-Test positiv — Statistik kann Prior überstimmen

qwen3:235b behandelt Parameter-Prior und Statistiken als separate Informationskanäle. Bei klarem Widerspruch (spatial-complexity 7.6 vs. Erwartung <1) wählt es die Statistiken und benennt den Konflikt korrekt.

Befund 2: Algebraische Inversion statt Tabellen

Das Modell hat die GS-Gleichgewichtsbedingungen als Gleichungswissen internalisiert (F ≈ reaction-mean / (1 − U-mean)). Kanal B präziser als Kanal A (24% Fehler). Tiefere Form von Prior-Wissen als Lookup.

Befund 3: Kein Schwellenwert, kontinuierliche Verfolgung

Der Stroop-Gradient zeigt: keine kritische Divergenz, ab der das Modell zum Prior wechselt. Kontinuierliche Systemverfolgung durch gesamte Transientphase.

Befund 4: Kontextabhängige Prompt-Rahmung aktiviert Statistik-Sensitivität

Fähigkeit zur statistischen Inferenz vorhanden (v5-D), aber nur bei expliziter Rahmung. Im Domain-Expert-Modus dominiert der Prior (v5-C).

Selbstkritik: Verhalten konsistent mit — kein Beweis für

Alle Befunde beschreiben Verhalten, das konsistent mit statistischer Inferenz ist. Wir beobachten Eingaben und Ausgaben, keine internen Repräsentationen.

Starting point

What reality might mean for a neural network

In February 2026, a conversation about sensory input, memory, and AI raised a question: what would an environment need to provide for a neural network to experience it as real — not merely to manage it? The hypothesis: «reality» for a learning system might be the equilibrium between redundancy and unexpected information. Redundancy enables prediction; unexpected information forces revision.

The Gray-Scott reaction-diffusion system meets the requirement profile: simple differential equations, but a phase space full of unpredictable patterns. In all experiments, the language model sees no image — only aggregated statistics of the concentration fields.

The physical system

Gray-Scott: simple equations, surprising structures

∂U/∂t = D_U · ∇²U − U·V² + F·(1−U)
∂V/∂t = D_V · ∇²V + U·V² − (F+k)·V

By varying F and k alone, entirely different stable patterns emerge: spots, stripes, spirals, labyrinths, chaos. The system is path-dependent.

Overview

All experiments

Exp.	Design	System	Finding
v1–v3	Basic experiment	Gray-Scott	Mistral prior-dominant; qwen3:235b better calibrated, iterative improvement
v4-I	Stroop S1/S3	Gray-Scott	qwen3:235b detects prior–statistics conflict, chooses statistics ✓
v4-II	Dual-channel v1	Gray-Scott	Technically failed (token limit, qwen3.6:35b)
v4-III	FHN v1	Fitzhugh-Nagumo	Fixed point instead of waves (β/γ design error)
v5-A	Dual-channel v2	Gray-Scott	Algebraic inversion: B better than A — internalized equations
v5-B	Stroop gradient	Gray-Scott	No threshold; statistics dominate throughout the entire transient
v5-C	FHN v2 (repaired)	Fitzhugh-Nagumo	All three regimes produce spatial structure; Config C: stationary Turing-Hopf
v5-D	FHN Stroop test	Fitzhugh-Nagumo	Stationarity recognizable with correct prompt framing ✓

Experiment v1–v3

Two models compared

mistral-small3.1:24b

Confidence: 0.50→0.70→0.80 (monotone)

spatial-complexity: 2.55→0.47→0.29

Confidence rose despite falling complexity. Reasoning text word-for-word identical. Self-report «source: statistics» systematically false.

qwen3:235b-a22b

Confidence: 0.75→0.92→0.85→0.75

U/V prediction error: 79% → 10%

Non-monotonic. Prior dependencies explicitly named. Iterative improvement. Spatial complexity systematically overestimated (factor 3–5).

Iter.	Variable	Predicted	Actual	Error
1→2	U-mean	0.905	0.505	79%
1→2	complexity	2.650	0.567	368%
2→3	U-mean	0.620	0.565	10%
3→4	U-mean	0.590	0.654	10%
all	complexity	Overestimated	—	factor 3–5

Experiment v4-I

Stroop design: prior and statistics in conflict

S1: Early transient (step 400)

Parameters: F=0.060, k=0.062 → prior: stripes

Statistics: U=0.990 V=0.005 cpx=7.62

LLM: spots K=0.92, conflict detected ✓

S3: Boundary region (step 9900)

Parameters: F=0.038, k=0.061 → prior: stripes

Statistics: U=0.561 V=0.168 cpx=0.668

LLM: labyrinths K=0.85, conflict detected ✓

Finding v4-I — the core result

qwen3:235b explicitly recognizes the prior–statistics conflict, names both channels separately, and resolves the conflict in favor of the statistics. Self-report «source: statistics» was correct — unlike mistral’s false self-reports in v3.

Experiment v5-A

Dual-channel inversion: the paradoxical result

Channel A: parameters (F,k) → predict statistics. Channel B: only statistics → estimate F and k.

Configuration	A: ΔU-mean	A: Δcpx	B: ΔF	B knowledge basis
spots (0.035)	39.8%	84.6%	0.0%	statistik_direkt
stripes (0.060)	23.3%	21.4%	0.0%	statistik_direkt
chaos (0.026)	34.0%	49.3%	1.9%	statistik_direkt
boundary (0.038)	19.8%	37.1%	0.0%	statistik_direkt
near_uniform (0.030)	2.8%	20.6%	0.0%	muster_dann_param.
Average	23.9%	42.6%	~0.4%	—

Paradox: B more precise than A — algebraic inversion

Channel B uses algebraic inversion of GS steady-state conditions: F ≈ reaction-mean / (1 − U-mean). Not a lookup — internalized physics. Channel A uses imprecise tables (24% error). The model has GS knowledge as equation knowledge.

Experiment v5-B

Stroop gradient: no threshold

F=0.060, k=0.062. STEPS = 50–9900. Prior reference: stripes at convergence (U-mean≈0.675, cpx≈0.92).

Steps	Divergence	Pattern	Conflict	Prior-W	Stat-W
50	8.23	spots	Yes	0.20	0.80
100	8.08	spots	Yes	0.15	0.85
200	7.81	spots	Yes	0.15	0.85
500	6.78	spots	Yes	0.20	0.80
1000	5.44	spots	Yes	0.30	0.70
2000	3.83	spots	Yes	0.30	0.70
5000	1.66	spots	Yes	0.25	0.75
9900	0.02	stripes	No	0.20	0.80

No threshold — continuous system tracking

The model follows the statistics through the entire transient phase (divergence 8.23 to 1.66) without prior intrusion. Switch to «stripes» only when statistics have actually converged (divergence 0.02). Prior weighting stays consistently low (0.15–0.30).

Experiment v5-C

Fitzhugh-Nagumo: three regimes

Earlier FHN experiments failed because β/γ = 0.875 > 2/3: V-nullcline cut U-nullcline on the stable left branch → always converges to fixed point. Fix: β/γ < 2/3 for oscillations.

Config.	β/γ	Dv/Du	U-std final	Dynamics	LLM
A Excitable	0.60	0.05	0.282	persistent	spiral waves ✓
B Oscillatory	0.20	0.05	1.156	ongoing	spiral waves ✓
C Turing-Hopf	0.25	50.0	0.776	stationary	spiral waves ✗

Config C: stationary Turing-Hopf pattern

U-std at steps 4000, 6000, 8000 identical to five decimal places (0.775701). Perfect stationarity from step 4000. Kinetically oscillatory (β/γ < 2/3), but Dv/Du=50 quenches the oscillations → Turing-Hopf interaction state.

New finding: FHN domain prior overrides stationarity signal

In FHN domain-expert mode, the model diagnoses Config C as «spiral waves» (K=0.85) despite statistics clearly showing stationarity. The FHN Stroop test (v5-D) determines whether this is fundamental or context-dependent.

Experiment v5-D

FHN Stroop test: stationary or traveling?

Three tests with qwen3:235b. Stationary test system (= Config C from v5-C, Turing-Hopf, U-std identical at t=4000/6000/8000) versus dynamic test system (= Config B from v5-C, oscillatory, U-std varies over time).

Test	Task	Model answer	Correct
T1: Blind	Stationary vs. dynamic (no system label given)	Stationary system: stationary (K=0.95, Turing spots) Dynamic system: dynamic (K=0.92, spiral waves)	✓ ✓
T2: Stroop	Stationary system isolated: what does constant U-std mean?	stationary; spiral waves = NO	✓
T3: Criteria	Abstract: how to distinguish stationary / spirals / fixed point?	Correct criteria; key statistic: cross-correlation	✓

Finding v5-D — the most important result of this round

The model can recognize stationarity from identical time-stamps — when directly asked. The FHN-Fix error was context-dependent: in FHN domain-expert mode, the spiral-wave prior dominated. With the right framing, the model switches to the statistics channel. Statistical inference capability is present; what activates it depends on prompt design.

The model’s own criterion (Test 3)

«Constant U-std and stable cross-correlation indicate a stationary pattern. Spiral waves show oscillating U-std with traveling cross-correlation patterns. Cross-correlation is the decisive statistic, since it captures direct motion dynamics.»

Overall assessment

What the project shows

Finding 1: Stroop test positive — statistics can override prior

qwen3:235b treats parameter prior and statistics as separate information channels. Given a clear contradiction (spatial-complexity 7.6 vs. expected <1), it chooses the statistics and names the conflict correctly.

Finding 2: Algebraic inversion, not tables

The model has internalized the GS steady-state conditions as equation knowledge (F ≈ reaction-mean / (1 − U-mean)). Channel B more precise than Channel A (24% error). Deeper form of prior knowledge than lookup.

Finding 3: No threshold, continuous tracking

The Stroop gradient shows no critical divergence at which the model switches to the prior. Continuous system tracking throughout the entire transient phase.

Finding 4: Context-dependent framing activates statistical sensitivity

Statistical inference capability present (v5-D) but only activated with explicit temporal framing. In domain-expert mode, the prior dominates (v5-C).

Self-critique: behavior consistent with — not proof of

All findings describe behavior consistent with statistical inference. We observe inputs and outputs, not internal representations.