3 comments

  • JasonViviers 6 hours ago

    I've been developing a practice I call OMLCP (Output-Maximizing Long-Context Programming), which exploits modern models' large output windows instead of optimizing for frequent small interactions.

    The core argument: agentic workflows made sense when max output was 4k tokens. Frontier models now support 64k-128k output tokens, but most tooling still optimizes for short responses. The structural inefficiency compounds quadratically with project size.

    The paper includes: - Formal token economics model with sensitivity analysis (13-17x advantage at 10k lines, 40-45x at 30k lines) - Three field deployments with real cost figures ($0.58 for 14,431 lines) - Honest failure cases including an SSE parsing incident that required full re-stream - A reproducibility protocol for independent validation - Capability tier framework to make claims model-version-agnostic

    I'm a business analyst in South Africa. No lab affiliation, no team. Happy to answer questions about methodology or the streaming infrastructure.

    • JasonViviers 3 hours ago

      Quick note: the Zenodo link appears to be temporarily blocked by their firewall due to a platform issue. I’ve contacted support and will share a working link shortly. Thanks to everyone who tried to access it.