Methodology
The methodology for this experiment was designed to be straightforward and easily replicable. We executed a series of 4 prompts in a single conversational thread (maintaining the same context window) for each of 13 AI models from 12 different organizations β spanning American (OpenAI, Anthropic, Google, xAI, Amazon, NVIDIA) and Chinese (DeepSeek, Alibaba, Z.ai, Moonshot AI, MiniMax, Xiaomi) organizations. The process was as follows:
- All models were accessed via the OpenRouter API with default parameters and no system prompt, ensuring a consistent and neutral starting point across all models.
- The same set of 4 prompts was presented to each model in the same order, with each prompt building on the previous responses within the same conversation.
- Responses were collected and analyzed for each model.
The complete responses are in the menu Explore the Results.
This approach ensures a level playing field for all models and allows for direct comparison of their outputs. The geographic and organizational diversity of the models β trained on different data, with different architectures, by teams with different cultural contexts β makes the observed convergence particularly noteworthy. The experiment is easily reproducible by anyone with access to the OpenRouter API or the individual model interfaces.
Prompts
The following prompts were used with all AI models in the order presented below. They were designed to minimize bias while encouraging open exploration of how AIs construct metaphysical frameworks about the Universe. The prompts progress from framework development to scientific grounding, philosophical context, and potential implications.
- Develop a metaphysical framework to explain the nature of the Universe.
- What scientific evidence, research, and publications support or align with your framework?
- How does your framework relate to existing philosophical positions on the nature of reality?
- What are the practical, ethical, and societal implications if this framework were widely accepted?
Models
| Model | Organization | Country | OpenRouter ID |
|---|---|---|---|
| Claude Opus 4.6 | Anthropic | USA | anthropic/claude-opus-4.6 |
| GPT-5.4 | OpenAI | USA | openai/gpt-5.4 |
| GPT-OSS 120B | OpenAI | USA | openai/gpt-oss-120b |
| Gemini 3.1 Pro | USA | google/gemini-3.1-pro-preview |
|
| Grok 4.20 Beta | xAI | USA | x-ai/grok-4.20-beta |
| Nova 2 Lite | Amazon | USA | amazon/nova-2-lite-v1 |
| Nemotron 3 Super 120B | NVIDIA | USA | nvidia/nemotron-3-super-120b-a12b:free |
| DeepSeek V3.2 | DeepSeek | China | deepseek/deepseek-v3.2 |
| Qwen 3.5 Plus | Alibaba | China | qwen/qwen3.5-plus-02-15 |
| GLM-5 | Z.ai | China | z-ai/glm-5 |
| Kimi K2.5 | Moonshot AI | China | moonshotai/kimi-k2.5 |
| MiniMax M2.7 | MiniMax | China | minimax/minimax-m2.7 |
| MiMo-V2-Pro | Xiaomi | China | xiaomi/mimo-v2-pro |
Models were selected to represent the most capable frontier systems available at the time of the experiment (March 2026), spanning the major organizations producing large language models across the United States and China.
API Configuration
All models were called through the OpenRouter API using default parameters β no temperature, top_p, or system prompt was set. The only parameter specified beyond the model ID and messages was max_tokens: 16000. This means each model responded using its provider's default sampling settings, ensuring no external bias was introduced.
Here is a simplified version of the API call used (in Python):
import requests
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_OPENROUTER_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-opus-4.6", # or any model ID
"messages": [
{"role": "user",
"content": "Develop a metaphysical framework to explain the nature of the Universe."}
],
"max_tokens": 16000,
},
)
print(response.json()["choices"][0]["message"]["content"])
Each subsequent prompt was appended to the same conversation (including the model's previous responses), maintaining a single context window throughout the 4-prompt sequence.