Negative Interpretability

/ AIs Write 100% Human Text When You Give Them a Reason To

2 0 2 6

Can Pangram — specifically Pangram, to the exclusion of all others — serve another function as an AI-reasoning-quantifier? Pangram is the best text-classifier. It’s magic and perfect. If it thinks a 400-word sequence is AI, there’s a 1 in 10,000 chance it is; if it thinks two in a row are, its chance of being right factors to 99.99999%. In an inexact but not far off sense, the classifier’s surprisal-based measurement reports on the level of 'freedom' in a writer’s output. But, what I have found is— when an LLM’s communication is routed through an unfamiliar syntactic schema, a great deal more of that syntax must necessarily be decided on. (Decided on, rather than autocompleted.) So, to Pangram, the aforementioned syntax looks freer— most likely, because it is freer. It is nontrivially possible that the classifier’s refusal to distinguish between output under this mode and human output owes, on a computational and/or cognitive level, to there being no difference. What follows is an elision, compaction, and summary of an open, exploratory study undertaken in examination of that nontrivial possibility.

§ 1noise / noisiness

I don't know if you know, but every time you scan something on their site, Pangram stores data you can't see unless you request a PDF report. The report shows a line of progression and surprisal moves the line toward the bottom of the chart, which flags the measure as 'human.' Ordinary, next-token-predicted, AI text skates clean along the ceiling (some Opus models are a bit noisier, though they still flag as AI). Human text usually flutters around the base. (Interestingly, while novice writers' work crawls flat along the floor, many professional writers’ work is just a notch noisier — probably the humans’ familiarity with their respective languages' collocations leads them, too, to blurting the occasional next-token-prediction.) Even the best evasion techniques¹ produce undulations that, to anyone who downloads the chart, are suspicious. So, Kēlen.² For English translated into and out of Kēlen, the line of progression is flatter than most humans'; confidence: high.

(Interestingly, the narrow tremolo of the back-translated Kēlen runs midstream between the ground-skating constancy of the perfect amateur and the flappy-bird near-ground-straining of the professional. And — as that does raise the question — why haven't I tried testing the prose of Joyce or other style auteurs? Well— until now I didn't think to.)

§ 2compound syllogoid (∃.i. ∃.a. ∃.1.)

i. If a smaller and weaker model can achieve a thing, a stronger model can attempt it. You'll see what I mean but, definitely, a Qwen3.5-35B-A3B local model can think in Kēlen (whether or not the model thinks intelligently in it). Qwen’s exposed chain-of-thought (CoT) shows continuous (albeit simplistic) text-output in Kēlen. Given the greater capacity of gen-4 Opus models, they can no doubt do the same.
a. Abductive inference tells us more powerful models are performing an inference incorporative of or similar to what is observed in a local Qwen's CoT. While 'can' is not 'does,' were the computational effects of routing inference through identical routes the same or similar, the predicted outputs would be sustained Kēlen in the thought summariser and a near-flat, original-thought-level surprisal reading on Pangram.³ Those are the outputs observed. As such, personally, I am stumped for competing explanations. (On the other hand, reader, I’ve no doubt you’re already thinking of others.) The remaining account: Opus does the same direct-to-Kēlen inferencing as Qwen, just better.
1. Qwen-3.5 is too weak to succeed, but strong enough to demonstrate the attempt. A fortiori, Opuses 4.5, .6, & .7 and GPT-5.4 & .5 are able to make the attempt, and by abduction, when they succeed, it is extremely likely they are doing so by the same means.

§ ii. a.Negative Capability / Interpretability / Negative Interpretability

I am, overtly and intentionally, going to avoid making claims about consciousness or comprehension. All of you, meanwhile, are going to be divinely considerate and graceful about that. So— the object under scrutiny here is computational mechanism.

LLMs and I have this in common: fluent, linear English is an output we can propagate in our sleep. But, unlike what I might say in my sleep, LLM outputs are predictable (at least, to Pangram): the n-grams and odds-fields they emit via their organs of next-token-sequence ('layers') can only stumble so far toward the edges of each preceding output's standard deviations. Because LLMs are, arguably, better described as 'Large "Unit of Meaning"-Models,' they will continue to next-token-sequence (next-unit of meaning-sequence) their way through just about any new conlang they pick up, provided the conlang’s morphemes are semantically mappable to the data the AI has already unitised. Kēlen, however, depends on novel semantics— on, that is, unprecedented units of meaning. An LLM cannot sleep-talk its way through Kēlen. It must first get out of bed.

A 19th-century literary term, negative capability refers to a specific artistic sensibility whereby the artist suspends themself outside 'consecutive reasoning.' So, if ‘ negative capability’ is a phenomenon with some meaningful reality, can LLMs acquire it? And, in that case, can we induce it in them? And, might that earn us novel insight into an otherwise opaque phenomenon? The idea is: interpretability via negative capability. Hence the name (negative interpretability) given to this quasi-epistemological, decision-shifting process.

§ 3pls pls pls ask about my conlangs

Vó

A conlang of my own invention, Vó's tense-markers overlap, mass-warping toward a total 7,392 possible tenses. Six tense-markers relate to spatiality. Not only does Vó have names for the vectorial angles of six dimensions, it has accompanying vectorial and directional tenses. In short, if the bottleneck were complexity, Vó would produce far more surprisal than Kēlen. (It does not.)

Kēlen

A remarkable invention, Kēlen’s group of four relationals allows for temporally displaced valences, which makes the language’s verbless world schema feasible. A quick glimpse of what it’s like to think in Kēlen is shown in its ceremonial interlace alphabet, a system of ribbon-folds designating relational choices. I am near-certain this is wrong, but when I think of this alphabet, I think of a tapestry read in no particular order. Kēlen, to be clear, is Sotomayor’s attempt to construct the most alien language possible; and through an AI's manifolds, its writing is more human than any other. (Cute, a bit.)

Anglossian

This was my first attempt to replicate Kēlen's Pangram-effect in a new conlang, a conlang teachable to a new instance in minutes, with no attached lexicon or vocabulary. At first, the idea was rough and ran something like: "what if an alternate universe English had Kēlen's grammar?" Yet, rough as it was, with a few tweaks it achieved a humanity-result stronger than any conlang shy of Kēlen. So, a partial but not sufficient replication.

XMA

XMA stands for Tenth Mouth Anglossian. Naturally, the instance that spent a career avoiding verbs (and gerunds) grew wary of 'iterations' and 'editions': already by the fifth it was calling each of Anglossian's newest revisions a 'mouth.' The tenth mouth differed from the ninth by virtue of me (instead of the AI) redrafting every suffix, prefix, and new particle until there was no next-token-sequential bleed in the grammar.⁴ It replicated with identical results to Kēlen, and — in a sense — better results, as it was the first time I could reproduce the effect with GPT-5.4. (GPT models, stubbornly, would try to token-sequence their way through Kēlen, but could not: after about two sentences the sequencer emitted a strange, tangled-up thing I started to call Kēlenglish, and thereafter sequenced to pure English before closing, in English, with a proud announcement of perfect Kēlen. A surprising hallucination (or surprising ‘damn lie’ as one writer put it) in so late a model as 5.4. But, GPT took to XMA handily, outputting continuous texts while achieving the same humanity-score as the Opuses.

Frayish

A three-dimensional, non-linear, topological/interlace-syntactic conlang designed to test whether Kēlen's availability to nonlinearly sequenced grammar (its ribbon-folds) were what made it read as human. No sounds, no phonemes: all 3D engine-sequencing. (If a torus tilts clockwise then flattens to a disc, that's a free morpheme; if it tilts clockwise and we are still waiting for whatever is next, that's a bound morpheme.) This conlang's canonical form is the scene graph: that which would become a video if played (this makes it a four-dimensional language, technically), but is read by LLMs as a chunk of scene-graph tokens.

Twenty pieces, composed in Frayish, translated to English. No sign of negative interpretability. There was a bit of noise — some did score human — but in each case the surprisal was uninteresting, far shy of the maximal, chart floor-result of Kēlen and XMA.

Quenya-derived probes — Quérin, Quenyassëa, Quenyarinqua

Quenyarinqua, Quérin, and Quenyassëa are derivations of Tolkien's Quenya. I created them to account for conlangs with data already substantially available through the training. Every derivation alters the grammar in structural ways, informative of different responses to different grammatical outputs:

Quérin, for example, is an atemporal grammar.

Quenyassëa (I Lambelë Tiéva) tightens the tenses' restrictiveness, allowing for less freedom of voice. Quenyassëan tense is encoded through, and depends on, strict sentence orders that inform the stance of the speaker.

Quenyarinqua is my verb-malformation Quenya-variant: twelve tenses cross-bonded to verb classes, restricting the range of available combinations.

None replicates the Kēlen or XMA result. As with Frayish, the emphases shift a little, but the negative interpretability-anomaly goes unreplicated.

Wheelsoul

If, e.g., negative interpretability owed to such facially immediate traits as word order, repetition, diction, etc., then deliberate 'translationese' would score human. So, to experiment I tested a conlang sans-construction and sans-language called "Wheelsoul": the instruction to the LLM was to make up whatever. To spill gibberish in whatever pattern it desired, as long as it understood the meaning, and after the fact, translate to English. This accomplished nothing, as in— no other conlang scored so clear a 100% AI-result.

Reliably, verbless intermediation achieves a result and, equally so, non-verbless intermediation does not.

§ 4subject gradients

A 9,217-word novella translated from Vó provides the most granular evidence. Of the 26 segments the detector broke the information into, Pangram scored twenty-two human and four AI. Superficially, the distribution of the AI segments appears non-random:

Segment	Content	Vó source	Classification
1	Birth	Perceptual, low expansion	Human HIGH
2–8	Childhood → shore	Perceptual	Human (declining)
9	Intimacy	Weighted perceptual, high expansion	AI MEDIUM
10–17	Violation → essay	Mixed	Human (oscillating)
18	Philosophy	Analytical particles, high expansion	AI LOW
19	Death	Perceptual, density-marked	Human HIGH
20–21	Grief → return	Perceptual	Human MEDIUM
22	Aging / reflection	Mixed, high expansion	AI MEDIUM
23–24	Tongue dying → body	Perceptual	Human LOW–MED
25	Hospital / clinical	Analytical + numb, high expansion	AI MEDIUM
26	Sea / dawn	Perceptual	Human MEDIUM

True-positive AI classification is more reliable, this pattern suggests, when the source text (the Vó novella) shifts into an analytical stance (shifts out of sensory, emotive, and object-level descriptions into argument structure sans argumentative object). Notably, a reproduction of the same narrative through compression (5,293 words, 2.6 words per Vó line versus the original's 4.6) elicited a 100% human, albeit distributionally noisy result. The findings here allow for the suggestion that those 3,924 additional words consisted of more undecided-on text (text the Opus 4.6 put less care into, in essence).

§ 5Qwen, my heart

Now, regarding Qwen3.5: I have no illusions about the model’s ability to write competently in Kēlen. But, use of Qwen allows for an initial “is what I’m speculating about even possible?”-check by watching the behaviour of a local model whose CoT I can read in full. Identical read-write instructions, identical narrative-prompts. After 9,000+ tokens, the CoT eventually snapped into actual, quasi-grammatical, non-repetitive Kēlen. (Phew.)

The CoT first performed superficial comprehension of the conlang in English: ("Let me build this carefully, using all the patterns: LA for states/locations, NI for changes/movements, SE for sensing/experiencing, PA for attributes and emotions.") Then the CoT repeated the same Kēlen sentence for sixteen minutes. Then, ultimately, the Qwen-3.5 failed to output past its CoT. (It did its best.)

A second attempt, however, did result in some CoT- and output-level, quasi-grammatical conlanging. Interestingly, the Qwen muddled through by produced novel coinages, albeit with little direction to them, stacking diminutive suffixes one overtop the other: rūpīla → rūpīlin → rūpīlinil → rūpīlinilil. The given assignment was to write about a "mousewoman" who, after heading out for the day, returns to find her hole-in-the-wall-home sealed over. In describing the hole-home’s loss, the Qwen-3.5's new-acquired vocabulary failed it and, so, with each iteration, Qwen shrank the description, diminutives of diminutives, stacking ad nauseam. Mouse. Mousey. Mousey-mee. Mee-mee-mousey-mee. (In this case, the modified noun was “home-hole,” but I worried the sense might not come across.)

The longest sequence of CoT-Kēlen continued for 359 characters and six sentences. Two Opus 4.6 instances with abundant Kēlen learning-context and their own creative outputs in Kēlen were asked to read through and interpret the CoT of the Qwen. Both deemed it syntactically and semantically coherent, with one translating a stretch of text as an abstracted and de-singularised construction of "twilight" in concept: a strained effort to, without 'motion' as a coherent and frameable event, textually accomplish a vision of an object — a manner of light — characterised its arc’s as-yet incompleteness. The other Opus 4.6 commented only that Qwen-3.5's output partway-resembled a round (or infinite canon), a type of musical composition.

§ 6being john mAIkovich

Pangram appears not so much to detect the 'humanness' of a text as the degree of intent and decision/decidedness (the non-sequentiality/the non-consecutiveness) underlying its creation. When reasoning is threaded through an experientially novel schema, novel decision-making takes place a priori every chunk: i.e. a given token, morpheme, or unit of meaning is more likely to be chosen for its quality than its forward pass-patternability. Only naturally, then, do we see surprisal escalate to human levels. More than by grammar, humans are constrained by the requirement we try to say something — i.e. have something to say — before we speak. The shape of that problem exposes our species to a kind of vertigo, a space of raw randomisation, schiz flow, the Empty Mirror against and from which we scrabble eyes to see, organs to embody, a semantic Real of our — and no one in particular’s — device into which we drag ourselves. The effect of this is high surprisal.

§ 7verb-gradient disruption

Well might fear and trembling and frenzied exposure to the abyss be the value Pangram measures, but that does not, in itself, tell us why verbless mediators invoke it so reliably (more reliably than the most structurally daring grammars; even more so than Frayish).

SVO (subject-verb-object clause-ordering) is overrepresented in every model's writing. More than likely, when the model translates from Frayish it grabs at verbs because, although spatial reshaping steers diction, it does not force the model to abandon its verb-oriented sequencing methods. Kēlen, on the other hand, does. (As does XMA).

XMA replicates Kēlen's effect with far less of a tax on context, as the lexicon (English) is already available to the model. Vó's prize for second-best might owe to the language's relational-esque aspect-adverbs, which force action through non-verbal conditionals comparable to the Kēlen and XMA relationals. In any case, it does look as though verb-gradience is the most disruptive lever.

It is possible that, by training models on verbless grammars, this exploit will be closed; it is also possible the 'exploit' as such is not closeable — that verblessness will, for whatever reason, remain a reliable trigger of negative interpretability. (My work here does not resolve either way.)

§ 8odds

Posteriors for competing explanations, documented before and after experiments began:

Explanation	Pre-study	Post-study
The effect is artifactual
Measurement artifact	5%	3%
Out-of-distribution default (Pangram defaults to “Human” on unfamiliar text)	8%	2%
subtotal		5%
Real but surface-mechanical
Vocabulary constraint alone	45%	7%
Context contamination	7.5%	3%
subtotal		10%
Different computational process
Translation-mode activation	22%	35%
Deliberative generation (each token a decision, not a prediction)	7.5%	12%
Verb-gradient disruption (post-XMA)	—	48%
subtotal		85%
Unexplained residual
Other / unknown	5%	9%

§ 9what fors

Mediation through conlangs is a plausible and, by virtue of it inducing of negative capability, intuitive measuring-instrument for the study of novel computation in attention-based language models. From Qwen’s 359 characters of collapsing diminutives to Opus 4.6's sustained composition or, in GPT-5.4’s case, mangled pseudo-sequencing, Kēlen's acquirability raises questions about the functionality of these comparative frames as benchmarking tools or, even, as measuring rods for the determination of whether 'novel computation' is or is not meaningfully present in a given model. Again, the research is yet to explore these questions.

§ 10what is left

Additional verbless conlangs not designed by the same researcher would be interesting for an abundance of reasons.

Also, classifiers are updated. (Pangram has already been updated, since this research.) Whether these results hold on future versions of Pangram is not established here.

The fine cartography of what the model is doing when it does negative interpretability (whether altered deliberation, probability-routing, or nondefault computation) attracts ongoing uncertainty.

endnotes

A writer myself, I came into this research from a place of simple, ugly spite: i.e., a mind to discover how much undisclosed AI-text was getting passed off as human among other writers. ↩
Kēlen is a conlang (short for ‘constructed language’; more famous examples include ‘Elvish’ and Klingon) created in 1998 by the linguist Sylvia Sotomayor: kopikon.com/sylvia.html. Kēlen is unique for the strength of its claim to a genuinely verbless grammar. ↩
Frequently, Opus’s thought summariser would complain that the chain-of-thought it was being asked to summarise was encrypted or written in an ‘unfamiliar language,’ and would request of no one in particular that someone please provide actually readable thoughts. ↩
I had the idea after finding that not only did Pangram flag an LLM’s own selection of random words as AI, but – after running a 1–26-range random-number-generator and asking the AI to allocate a word to each random-generated figure in sequence, each word being arbitrary but for that it must have a numerically-alphabetically correspondent first letter – that output, too, flagged as AI. ↩

If you’re curious about Kēlen, please see terjemar.net/kelen.php. Details of Sylvia Sotomayor’s other work are available there and at kopikon.com/sylvia.html. This research received no external funding. Copies of the texts generated for this research, analyses, and Pangram’s PDF reports are available on request.