Performative Bafflement

Performative Bafflement

Links Post #3

Half the links before the paywall, and a half off coupon, as always

Performative Bafflement's avatar
Performative Bafflement
May 28, 2026
∙ Paid
  1. Great post on needing a market like infrastructure, including calibrated introspection, in order to allocate AI agents correctly.

    The current problem is calibration, and showing current cals helps a little, but the auction gap to oracle remains similar. Also shows the simple centralized router they built gets only 27/50 correct, versus 23/50 in the market.

Strange Loop Canon
Agent, Know Thyself! (and bid accordingly)
Written with the wonderful Andrey Fradkin, who does the Justified Posteriors podcast…
Read more
a month ago · 44 likes · 16 comments · Rohit Krishnan and Andrey Fradkin

Commenters point out interesting stuff - Coase’s nature of the firm implies that there are architectures of groups of semi-specialized or differently harnessed LLM’s that should be much more effective as a problem solver of this nature. LLM’s themselves are already coalitions of minds! So there’s an exciting greenfield space here where everyone is going to be experimenting with markets, virtual firms, auctions, and other arrangements to find the best coordination and allocation strategies for groups of AI minds to work together most effectively.

And anyone can participate! And there’s a ton of arbitrage! Just think, if you put together an architecture that cooks and is 30% more efficient and / or quick - that’s your cost structure, and you have what amounts to a 30% cost structure advantage. Immense!

Fun ideas that immediately occur to me:

> Harnesses around metacognition and self calibration

> Self training and evaluating dojos to elicit better same, ideally set up in a way that tunes the output quality via hyperparameters and elicits a good map of where a given mind is strong or weak, because you’re going to need to throw every single new model update or slight revision in there

> Architectures that are optimized around finding and allocating relatively self contained blocks of tasks to calibrated sub-minds, to minimize the context bloat that is inherent in understanding what’s going on and the history of what’s been done to date. We need to separate architectural minds and execution minds, many are already doing this, Claude code kinda does it natively with subagents, but poorly.

> Rohit has a virtual firm project he’s working on here - https://github.com/Strange-Lab-AI/vei - “VEI turns built-in scenarios or real company records into a runnable company world. You can use it to test an agent before it touches a real company, watch an outside agent through a governed twin, branch from a real historical decision and compare a different move, or draft grounded knowledge artifacts from the same company state.”

It’s basically a state architecture where you can do redos, sort of a part of the dojo idea.

  1. Should you trust the Netflix top ten?

No! broadly, only duration over time predicts quality. And they promote their own series in there, which should surprise nobody.

Stat Significant
Should You Trust the Netflix Top 10? A Statistical Analysis
Intro: “If It’s in the Top 10, It Must Be Good…
Read more
a month ago · 58 likes · 10 comments · Daniel Parris
  1. Fun analysis and summation of the goblin thing, where GPT was mentioning goblins more often than was comfortable

David Oks
Language models are weird for the same reason human cultures are weird
In November 2025, shortly after OpenAI released GPT-5.1—a new model that promised “a smarter, more conversational ChatGPT”—a small set of users started to notice something weird. GPT-5.1 was indeed smarter and more conversational; but it also had a strange habit of referring to things as “goblins.” For a…
Read more
a month ago · 174 likes · 12 comments · David Oks
  1. An aesthetic rant about the moral superiority of detached garages

    ”
    But attached garages are an exception to the rule that there are always exceptions to the rule. They’re always ugly. In no case are they ever not ugly. To be more precise, every house looks worse with an attached garage than it would if that garage were entirely omitted.

    Even garage defenders rarely object to this point, but I’ll go over the evidence anyway.”

  2. A brief multi-culture overview of women’s roles when they don’t have the right to exit

    Everyone thinks I’m crazy because I’m anti-marriage, and strongly anti-marriage in the sense I believe that “encouraging more marriage to ameliorate the fertility crisis” is a huge mistake, on the order of creating the Torment Nexus to crank out a handful of incremental taxpayers.

    And this is very against male norms by current gender discourse standards! You can take the most libertarian to right-leaning “the government should butt out of everything” guy, but when it comes to divorce, suddenly they’re fully against all human autonomy or choice or dignity in half the population and think women should be forcibly shackled to duds, which I’ve always found hilarious.

    But this is exactly why marriage with no exit sucks - women unfailingly get milked for free labor, beaten up, raped, and lose much of the autonomy and choice that matters, and the article is a good survey of that. It really boggles my mind when so many men are arguing FOR this, as though they don’t have mothers and sisters and daughters.

The Great Gender Divergence
Patriarchal Rents
Before the 20th century, the world was overwhelmingly patriarchal: male coalitions ran powerful institutions, religious authorities sacralised male dominance, and female submission was often glorified as moral virtue. This system not only puffed up men’s egos, it also enabled extraction: fraternities captured benefits that would have otherwise been redu…
Read more
a month ago · 64 likes · 6 comments · Alice Evans
  1. Curl, which is 176k lines long and has been rewritten 4x times, only finds 1 minor vuln when audited by Mythos. This is after finding and fixing ~188 over the years. I found this audit interesting, because it’s at least a data point on “what is the level of complexity, auditing, and public use that is likely to be largely okay.”

    Then again, heartbleed existed too. And I can’t help but shore this up with Mythos pwning Apple’s brand new M5 memory-safe architecture, too:

X avatar for @IntCyberDigest
International Cyber Digest@IntCyberDigest
❗️🚨 BREAKING: Researchers used Mythos Preview to find the first public macOS kernel memory corruption exploit on Apple's M5 silicon, they give a glimpse into Mythos say it’s really powerful. Apple spent five years and an estimated several billion dollars building Memory
1:39 PM · May 15, 2026 · 3.32M Views

130 Replies · 706 Reposts · 6.74K Likes

(archive link for non-twitter folk)

  1. Link to take the RMET, or “Reading the Mind in the Eyes” test, which is infamous. Have you ever heard people referring to “studies” pointing out that team performance is directly correlated with female participation in meetings, and / or number of women on the team? This was big and really leaned on when it came to DEI initiatives in the workplace.

    It’s fake, of course - the true distinguisher is how well people on the teams do at RMET, and women on average do slightly better than men.

    One still-valid use with significant Cohen’s D (0.4 - 0.9) is detecting autistic tendencies - people who do worse than average on the test, particularly under a score of 22, tend to have more autistic tendencies. It should be noted validity is only at the group rather than individual level, for all these results.


    One final fun factoid - pace Wendy Higgin’s recent work a lot of what RMET is measuring is likely to be verbal intelligence rather than literal “theory of mind,” because you need to know what “aghast” and “despondent” means to actually do well.

  2. If we switched to a post-scarcity measure like Doctorow’s whuffie, or fame, would we be any better off than today’s income / wealth endpoints?


    No! Fame follows a power law distribution just like income or wealth, and this post aggregates the literature to calculate that the Gini, or degree of inequality, is roughly equal to income’s degree of inequality. Womp womp.


    ”The power-law fits of Bagrow and ben-Avraham give a in the range 1.77 to 2.69 for scientists, 2.74-3.62 for aces, 1.88-2.10 for actors, 1.57-2.03 for villains, 1.88-2.43 for programmers, 1.71-1.92 for runners, and 1.74-2.57 for students. “

    Us income alpha is ~2 overall, and ~1.5 for the top 1% (lower alpha means more inequality).


Interregnum and coupon

And once more before the paywall, I will offer a lifetime 50% off coupon, because getting more paid subscribers helps increase reach and reads.

I want to be clear, I’m definitely not in it for the money! I’m charging literally the least amount the platform lets you charge, as DeepLeftAnalysis🔸 pointed out when he became a subscriber, with the coupon it’s $2.50 a month here, basically half a coffee per month - but the way the Substack algorithm works, the more paid subscribers you get, the more reach and exposure you get on the Substack side.

And that’s what I’m after, more readers, a more active comment section, all that good stuff. So if you want to help out, and want to get access to all the links, please avail yourself of a lifetime 50% off coupon here:

https://performativebafflement.substack.com/46810a7d

User's avatar

Continue reading this post for free, courtesy of Performative Bafflement.

Or purchase a paid subscription.
© 2026 Performative Bafflement · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture