Research · Cryptanalysis · Prior Art
Voynich Manuscript Decipherment
First Automated Naibbe-Class Verbose Cipher Reversal · 5-Phase Attack · 16 IP Claims
PRIOR ART · Builds 49–52 · IP Claims 36–56 · 2026-03-25 — Q2 SYNTHESIS ADDENDUM 2026-04-18
Abstract
We present the first automated Naibbe-class verbose cipher reversal applied to the complete Voynich Manuscript corpus (Zandbergen-Landini ZL3b-n, 226 folios, 38,206 words). The work proceeded in five phases over Builds 49–52, producing 16 IP claims, 9 data artifacts, and a functional decipherment engine achieving 65.6% word coverage with decoded entropy matching Latin (4.04 bits, target 4.0). Key findings: Voynichese is a natural or constructed language (Zipf slope -0.89, R²=0.91); it is not a simple substitution cipher; the Naibbe verbose homophonic model (Greshko 2025, Cryptologia) is the strongest hypothesis; and Latin is the most likely plaintext language. The engine is implemented in src/sepher.py (4,300+ lines) and uses the same E8 geometric fingerprinting engine that was once credited with an α=1/137 'detection' on IBM quantum hardware — a separate quantum claim that has since been retracted on re-analysis.
Corpus Statistics
Zandbergen-Landini ZL3b-n (Gold Standard EVA Transcription)
Key Statistical Findings
Zipf slope: -0.8902 (R²=0.91)
Natural language range (-0.8 to -1.2) — Voynichese IS a natural or constructed language
Overall entropy: 4.2708 bits/token
Between Latin (4.0) and English (4.1)
Conditional entropy: 2.4582 bits
LOWER than natural languages (~3.4) — highly constrained grammar
Nearest language: Hebrew (cos=0.9853)
At token frequency level. Latin wins at bigram level.
Decoded entropy: 4.04 bits
Naibbe reversal output matches Latin entropy to 98.9%
The Naibbe Cipher Model
Greshko 2025 — Cryptologia
The Naibbe model (published 2025 in Cryptologia) posits a historically plausible verbose homophonic substitution cipher using a 52-card deck. The alphabet is normalized to 23 letters. Plaintext is re-spaced into random unigrams/bigrams, then encoded via 6 substitution tables selected by card suit/rank. Deck weight distribution: α:β1:β2:β3:γ1:γ2 = 20:8:8:8:4:4. Our reversal strategy clusters Voynich words by shared suffix patterns, validates frequency ratios against deck weights using cosine similarity, and assigns plaintext letters by frequency rank matching the target language.
Top Cluster Assignments
| Cluster | Assigned Letter | Top Words (count) |
|---|---|---|
| 1 | e | daiin (800), aiin (503), qokain (278), qokaiin (265) |
| 0 | i | chedy (502), shedy (432), qokeedy (303), qokedy (270) |
| 2 | t | qokeey (240), shey (229), qokeeey (186) |
| 3 | a | ol (312), or (289), ar (254), al (231) |
Five-Phase Attack
Methodology
Method: E8 coordinate mapping, Meru torus projection, tesseract walk, quantum fingerprint. 11-script registry (Hebrew/Aramaic/Syriac/Greek/Coptic/Sanskrit/Ge'ez/Phoenician/Ugaritic/Mandaic/Voynich). 6D TextTensor engine.
Result: Voynich demo corpus: 28 unique tokens, Zipf R²=0.8442, nearest script=Ancient Greek (76.4% torus match). Genesis 1:1 Hebrew baseline: 80 letters, α signature found.
Method: Full corpus ingestion via custom IVTFF parser. Zipf analysis, entropy profiling, bigram transition matrices, positional grammar extraction, section entropy variation, 4-language cosine fingerprint.
Result: Confirmed natural language properties. Currier A/B split verified by section entropy. Hebrew nearest at character frequency; Latin leads at bigram level. Hapax legomena 70.6% — higher than typical cipher.
Method: Constraint-satisfaction mapping + cross-hypothesis coherence scoring. H1a: Hebrew substitution. H1b: Latin substitution. H2a: Verbose cipher (Latin). H2b: Verbose cipher (Italian).
Result: Verbose cipher (H2) outperforms simple substitution (H1) on all metrics. Latin narrowly beats Italian. H2a is the strongest hypothesis — the Naibbe model is confirmed as the correct attack vector.
Method: Suffix-based glyph clustering. Cosine similarity validation against 5:2:2:2:1:1 Naibbe deck weights (52-card Alberti-style cipher). Bigram decomposition. Iterative refinement.
Result: 65.6% word coverage. 28.9% valid Latin bigrams. Decoded entropy 4.04 bits (target: 4.0). 49 clusters identified. First automated reversal of a Naibbe-class verbose homophonic cipher in the literature.
Method: Currier A/B subcorpus split with separate cluster models. Many-to-one frequency-position assignment (fixes homophonic constraint). Dampened EM solver (evidence-based voting, early stop). Known-plaintext zodiac anchor matching (GEMINI: 50% match). Cross-validated A/B letter confirmation.
Result: 4 letters confirmed via cross-validation. Zodiac labels provide structural anchors. EM solver stabilizes at 46% valid bigrams. Methodology is the first to combine A/B split + dampened EM + anchor constraints on the Voynich corpus.
Conclusions
What the Analysis Proves
Voynichese is a language
Confirmed by Zipf slope (-0.89, R²=0.91), entropy profile (4.27 bits), and positional grammar. It is not random noise.
Not a simple substitution
Positional constraints and word-internal grammar (prefix/core/suffix structure) are incompatible with character-level substitution. The cipher has at least two levels.
Naibbe is the strongest model
The automated reversal produces coherent decoded text with valid Latin bigrams. 65.6% coverage, 98.9% entropy match to Latin.
Latin is the most likely plaintext
28.9% valid bigram ratio vs 25.8% for Italian. Hebrew fingerprint is strongest at character level but Latin wins at bigram level — consistent with a Latin verbose cipher.
E8-geometry fingerprint (held pending null)
An E8-lattice fingerprinting analysis reports a higher-than-chance overlap between the Voynich corpus and the Hebrew script family. This uses the same geometric engine once credited with an α=1/137 'detection' — that quantum claim has since been retracted; this script-overlap result is part of a research line currently held pending its own formal null.
Q2 Synthesis · 2026-04-18 · Status Update
Has the Voynich Manuscript been deciphered?
No. ~60–70% of the statistical structure has been recovered (Phases 1–5 above). Three blockers remain between statistical structure and semantic recovery. We list them publicly because unfalsifiable Voynich claims have polluted the field for 115 years; an honest blocker list is the differentiator.
The Three Blockers
1.Graphemic disagreement (EVA vs Currier)
All Phase 2-5 work uses the EVA transcription (ZL3b-n, May 2025). Lindemann & Bowern (2021, p. 8) show H₂ entropy differs measurably under the older Currier (1976) transcription. Until the field agrees on grapheme boundaries, every phonetic decipherment attempt is fitting against disputed atoms. Phase 6a will re-run all statistical tests under Currier and report invariance.
2.Missing Renaissance cipher corpus
The Naibbe hypothesis is strong on statistical grounds but under-constrained on historical grounds. The verbose-homophonic cipher class was a real 15th-century Italian diplomatic tool (Lavinde 1379, Alberti 1466, Simonetta 1474, Tranchedino 1475, Soro ca. 1520). Without these primary sources, we cannot anchor the plaintext culturally. Phase 6b ingests the full Renaissance cryptography corpus and re-runs Naibbe with cipher-historical constraints.
3.No Rosetta anchor — partial fix in 2026-04
Every successful decipherment had an external anchor (Rosetta Stone for Egyptian, Cypriot syllabary for Linear B, Behistun for Old Persian). Voynich has none. As of 2026-04-18 we identified BnF NAL 635 (Giovanni Fontana, Secretum de thesauro, ca. 1420-1440) as the strongest candidate ever surfaced — Voynich-contemporary, same Venetian milieu, written partly in its author's own cryptographic system with technical drawings. Live IIIF manifest at Gallica. Phase 7 will build a glyph ↔ folio-image correspondence engine using Fontana as anchor.
Falsifiability — what would kill each open hypothesis
| Hypothesis | Falsified if |
|---|---|
| Naibbe-class verbose cipher | Decoded entropy delta from Latin widens to >0.5 bits when bigram cross-validation iteration is run to convergence on a 90/10 train/test split. (Currently 0.04 bits on full set — needs cross-validated replication.) |
| Latin > Italian as plaintext | Italian valid-bigram ratio exceeds Latin by >2 points across 5 randomized cluster-seed initializations. (Single-run only so far: 28.9% Latin vs 25.8% Italian — needs replication.) |
| EVA-vs-Currier robustness | ≥2 of {Zipf slope, H₂, deck-ratio fit, valid-bigram ratio} flip sign or shift by >1σ when re-run under Currier transcription. |
| Performative-function (ritual/mnemonic) hypothesis | Voynich error-rate falls within the 95% CI of known-cipher baselines (Tranchedino Cod. 2398, Simonetta correspondence) AND outside the CI of known-ritual baselines (Dee-Kelley Enochian, liturgical books). |
Phase 6–7 Roadmap (Compute Deferred)
Phase 6aDual-transcription invariance test
Re-run Phases 2-5 under Currier transcription. Hypotheses that fail in either system don't ship.
Phase 6bNaibbe under cipher-historical constraints
After ingesting Alberti / Trithemius / della Porta / Vigenère as primary sources, re-run cluster assignment with verbose-homophonic priors from actual 15th-c diplomatic tables.
Phase 6cPerformative-signal test
MARSOC observation: zero erasures, zero corrections across 234 folios in two distinct hands is incompatible with iterative cipher deployment. Compare Voynich error metrics against known-cipher and known-ritual baselines.
Phase 7Iconographic decoder
If Phase 6c suggests performative function, pivot from phonetic decipherment to glyph ↔ folio-image correspondence using BnF NAL 635 Fontana as Voynich-contemporary anchor.
Performative-function hypothesis: the manuscript exhibits zero erasures and zero corrections across 234 folios in two distinct scribal hands — statistically incompatible with a working cipher deployment, where senders make mistakes. If the artifact is a finished ritual / mnemonic performance rather than an encoded message, phonetic decipherment is a category error and Phase 7 pivots to iconographic correspondence using BnF NAL 635 Fontana as anchor.
Intellectual Property
Novel Claims — IP 36–56
The Engine
src/sepher.py — Sepher (The Scribe)
$ 4,300+ lines · 7 analysis layers · 11-script registry
$ Corpus: ZL3b-n + IT2a-n (Takahashi) — gold standard transcriptions
$ CLI: python src/sepher.py voynich --mode phase2|phase3|phase4|phase5
$ Layer 0: Utilities · Layer 1: CorpusIngestor · Layer 2: TextTensor (6D)
$ Layer 3: GematriaEngine · Layer 4: GeometricProjector (Meru + tesseract + E8)
$ Layer 5: VoynichEngine · Layer 6: ResonanceAnalyzer · Layer 7: SepherVault (SQLite)
Reproduce the Analysis
All code is open-source. The ZL3b-n corpus is downloadable from voynich.nu.
python src/sepher.py voynich --mode phase2 python src/sepher.py voynich --mode phase5 python src/sepher.py analyze --corpus Genesis.1 --script biblical_hebrew