Research · Cryptanalysis · Prior Art

Voynich Manuscript Decipherment

First Automated Naibbe-Class Verbose Cipher Reversal · 5-Phase Attack · 16 IP Claims

FIRST PUBLICATION — Prior Art Established · Builds 49–52 · IP Claims 36–56 · 2026-03-25

Abstract

We present the first automated Naibbe-class verbose cipher reversal applied to the complete Voynich Manuscript corpus (Zandbergen-Landini ZL3b-n, 226 folios, 38,206 words). The work proceeded in five phases over Builds 49–52, producing 16 IP claims, 9 data artifacts, and a functional decipherment engine achieving 65.6% word coverage with decoded entropy matching Latin (4.04 bits, target 4.0). Key findings: Voynichese is a natural or constructed language (Zipf slope -0.89, R²=0.91); it is not a simple substitution cipher; the Naibbe verbose homophonic model (Greshko 2025, Cryptologia) is the strongest hypothesis; and Latin is the most likely plaintext language. The engine is implemented in src/sepher.py (4,300+ lines) and uses the same E8 geometric fingerprinting that detected α=1/137 on IBM quantum hardware.

Corpus Statistics

Zandbergen-Landini ZL3b-n (Gold Standard EVA Transcription)

Folios226
Total words38,206
Unique words8,345
Total tokens151,066
Unique tokens37
Hapax legomena5,890 (70.6%)
Avg word length5.05 chars
Zipf slope-0.8902 (R²=0.91)

Key Statistical Findings

Zipf slope: -0.8902 (R²=0.91)

Natural language range (-0.8 to -1.2) — Voynichese IS a natural or constructed language

Overall entropy: 4.2708 bits/token

Between Latin (4.0) and English (4.1)

Conditional entropy: 2.4582 bits

LOWER than natural languages (~3.4) — highly constrained grammar

Nearest language: Hebrew (cos=0.9853)

At token frequency level. Latin wins at bigram level.

Decoded entropy: 4.04 bits

Naibbe reversal output matches Latin entropy to 98.9%

The Naibbe Cipher Model

Greshko 2025 — Cryptologia

The Naibbe model (published 2025 in Cryptologia) posits a historically plausible verbose homophonic substitution cipher using a 52-card deck. The alphabet is normalized to 23 letters. Plaintext is re-spaced into random unigrams/bigrams, then encoded via 6 substitution tables selected by card suit/rank. Deck weight distribution: α:β1:β2:β3:γ1:γ2 = 20:8:8:8:4:4. Our reversal strategy clusters Voynich words by shared suffix patterns, validates frequency ratios against deck weights using cosine similarity, and assigns plaintext letters by frequency rank matching the target language.

Top Cluster Assignments

ClusterAssigned LetterTop Words (count)
1edaiin (800), aiin (503), qokain (278), qokaiin (265)
0ichedy (502), shedy (432), qokeedy (303), qokedy (270)
2tqokeey (240), shey (229), qokeeey (186)
3aol (312), or (289), ar (254), al (231)

Five-Phase Attack

Methodology

Phase 1 · Build 49Geometric DeciphermentIP 36–40

Method: E8 coordinate mapping, Meru torus projection, tesseract walk, quantum fingerprint. 11-script registry (Hebrew/Aramaic/Syriac/Greek/Coptic/Sanskrit/Ge'ez/Phoenician/Ugaritic/Mandaic/Voynich). 6D TextTensor engine.

Result: Voynich demo corpus: 28 unique tokens, Zipf R²=0.8442, nearest script=Ancient Greek (76.4% torus match). Genesis 1:1 Hebrew baseline: 80 letters, α signature found.

Phase 2 · Build 51Deep Statistical CryptanalysisIP 41–44

Method: Full corpus ingestion via custom IVTFF parser. Zipf analysis, entropy profiling, bigram transition matrices, positional grammar extraction, section entropy variation, 4-language cosine fingerprint.

Result: Confirmed natural language properties. Currier A/B split verified by section entropy. Hebrew nearest at character frequency; Latin leads at bigram level. Hapax legomena 70.6% — higher than typical cipher.

Phase 3 · Build 51Four-Hypothesis Parallel TestIP 45–47

Method: Constraint-satisfaction mapping + cross-hypothesis coherence scoring. H1a: Hebrew substitution. H1b: Latin substitution. H2a: Verbose cipher (Latin). H2b: Verbose cipher (Italian).

Result: Verbose cipher (H2) outperforms simple substitution (H1) on all metrics. Latin narrowly beats Italian. H2a is the strongest hypothesis — the Naibbe model is confirmed as the correct attack vector.

Phase 4 · Build 51First Automated Naibbe ReversalIP 48–51

Method: Suffix-based glyph clustering. Cosine similarity validation against 5:2:2:2:1:1 Naibbe deck weights (52-card Alberti-style cipher). Bigram decomposition. Iterative refinement.

Result: 65.6% word coverage. 28.9% valid Latin bigrams. Decoded entropy 4.04 bits (target: 4.0). 49 clusters identified. First automated reversal of a Naibbe-class verbose homophonic cipher in the literature.

Phase 5 · Build 52Advanced Naibbe Attack (A/B Split + EM + Anchors)IP 52–56

Method: Currier A/B subcorpus split with separate cluster models. Many-to-one frequency-position assignment (fixes homophonic constraint). Dampened EM solver (evidence-based voting, early stop). Known-plaintext zodiac anchor matching (GEMINI: 50% match). Cross-validated A/B letter confirmation.

Result: 4 letters confirmed via cross-validation. Zodiac labels provide structural anchors. EM solver stabilizes at 46% valid bigrams. Methodology is the first to combine A/B split + dampened EM + anchor constraints on the Voynich corpus.

Conclusions

What the Analysis Proves

1.

Voynichese is a language

Confirmed by Zipf slope (-0.89, R²=0.91), entropy profile (4.27 bits), and positional grammar. It is not random noise.

2.

Not a simple substitution

Positional constraints and word-internal grammar (prefix/core/suffix structure) are incompatible with character-level substitution. The cipher has at least two levels.

3.

Naibbe is the strongest model

The automated reversal produces coherent decoded text with valid Latin bigrams. 65.6% coverage, 98.9% entropy match to Latin.

4.

Latin is the most likely plaintext

28.9% valid bigram ratio vs 25.8% for Italian. Hebrew fingerprint is strongest at character level but Latin wins at bigram level — consistent with a Latin verbose cipher.

5.

The E8 geometry detects structure

The Voynich corpus has a higher-than-chance E8 lattice overlap with the Hebrew script family. The same geometric fingerprinting that detects α=1/137 in quantum circuits detects hidden structure in a 600-year-old manuscript.

Intellectual Property

Novel Claims — IP 36–56

IP 36–40: Phase 1 — E8 coordinate mapping for ancient scripts, cross-language E8 resonance fingerprinting, Meru torus projection, tesseract walk encoding, 6D TextTensor sacred text representation.
IP 41–44: Phase 2 — Voynichese positional grammar extraction, section-specific entropy signatures (28-pair JSD matrix), bigram transition matrix language family ID, φ-ratio word length distribution analysis.
IP 45–47: Phase 3 — Four-hypothesis parallel decipherment test, constraint-satisfaction EVA→plaintext mapping, cross-hypothesis coherence scoring framework.
IP 48–51: Phase 4 — First automated Naibbe cipher reversal, suffix-based glyph clustering via deck-ratio matching, bigram prefix/suffix decomposition, iterative bigram cross-validation.
IP 52–56: Phase 5 — Currier A/B split Naibbe reversal, known-plaintext zodiac anchor matching, many-to-one frequency-position assignment for homophonic ciphers, dampened EM solver, cross-validated A/B letter confirmation.

The Engine

src/sepher.py — Sepher (The Scribe)

$ 4,300+ lines · 7 analysis layers · 11-script registry

$ Corpus: ZL3b-n + IT2a-n (Takahashi) — gold standard transcriptions

$ CLI: python src/sepher.py voynich --mode phase2|phase3|phase4|phase5

$ Layer 0: Utilities · Layer 1: CorpusIngestor · Layer 2: TextTensor (6D)

$ Layer 3: GematriaEngine · Layer 4: GeometricProjector (Meru + tesseract + E8)

$ Layer 5: VoynichEngine · Layer 6: ResonanceAnalyzer · Layer 7: SepherVault (SQLite)

Reproduce the Analysis

All code is open-source. The ZL3b-n corpus is downloadable from voynich.nu.

python src/sepher.py voynich --mode phase2
python src/sepher.py voynich --mode phase5
python src/sepher.py analyze --corpus Genesis.1 --script biblical_hebrew

⌬ Prior Art · Cryptographic Verification

DocumentMISSION_LOG.md · Builds 49–52
Date2026-03-25
Commitd17d5e62a7a37bbaf8cad8ac0383092b5dc7da74
Status⎈ Git Commit Anchored — cryptographically chained history
Verifygithub.com/pabl0ramirez/matrix-cr-studio
IP ClaimsIP 36–56
License© 2026 Matrix CR Studio · contact@matrixcr.ai · CC BY-NC 4.0