Context Engineering for Agents

(rlancemartin.github.io)

86 points | by 0x79de 3 days ago ago

27 comments

  • sgt101 21 minutes ago

    I read these things and I think : this can never work. This is passing a huge set of parameters to a probabilistic map function.. one token changes and you get a completely useless result.

    • ActionHank 15 minutes ago

      I mean, you could apply logic here, but I don't think the people with the money care about logic, just more money, and they've been told that there will be more money from replacing human employees, so really even if you're correct, you're still wrong.

  • ares623 10 hours ago

    Another article handwaving or underselling the effects of hallucination. I can't help but draw parallels to layer 2 attempts from crypto.

    • FiniteIntegral 8 hours ago

      Apple released a paper showing the diminishing returns of "deep learning" specifically when it comes to math. For example, it has a hard time solving the Tower of Hanoi problem past 6-7 discs, and that's not even giving it the restriction of optimal solutions. The agents they tested would hallucinate steps and couldn't follow simple instructions.

      On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

      • senko 7 hours ago

        That's one reading of that paper.

        The other is that they intentionally forced LLMs to do the things we know are bad at (following algorithms, tasks that require more context that available, etc) without allowing them to solve it in a way they're optimized to do (write a code that implements the algorithm).

        A cynical read is that the paper is the only AI achievement Apple has managed to do in the past few years.

        (There is another: they managed not to lose MLX people to Meta)

      • koakuma-chan 2 hours ago

        > On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

        It is different. There are usually two main parts to the prompt:

        1. The context.

        2. The instructions.

        The context part has to be optimized to be as small as possible, while still including all the necessary information. It can also be compressed via, e.g., LLMLingua.

        On the other hand, the instructions part must be optimized to be as detailed as possible, because otherwise the LLM will fill the gaps with possibly undesirable assumptions.

        So "context engineering" refers to engineering the context part of the prompt, while "prompt engineering" could refer to either engineering of the whole prompt, or engineering of the instructions part of the prompt.

        • 0x445442 2 hours ago

          I'm getting on in years so I'm becoming progressively more ignorant on technical matters. But with respect to something like software development, what you've described sounds a lot like creating a detailed design or even pseudocode. Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.

          • koakuma-chan 2 hours ago

            > But with respect to something like software development, what you've described sounds a lot like creating a detailed design or even pseudocode.

            What I described not only applies to using AI for coding, but to most of the other use cases as well.

            > Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.

            There are many ways to use AI for coding. You could use something like Claude Code for more granular updates, or just copy and paste your entire code base into, e.g., Gemini, and have it oneshot a new feature (though I like to prompt it to make a checklist, and generate step by step).

            And that is also not only about just typing, that is also about debugging, refactoring, figuring out how a certain thing works, etc. Nowadays I not only barely write any code by hand, but also most of the debugging, and other miscellaneous tasks I offload to LLMs. They are simply much faster and convenient at connecting all the dots, making sure nothing is missed, etc.

      • skeeter2020 25 minutes ago

        We used to call both of these "being good with the Google". Equating it to engineering is both hilarious and insulting.

      • OJFord 6 hours ago

        Let's just call all aspects of LLM usage 'x-engineering' to professionalise it, even while we're barely starting to figure it out.

        • antonvs 3 hours ago

          It’s fitting, since the industry is largely driven by hype engineering.

          • klabb3 13 minutes ago

            It’s not good for engineering with the dilution of the term. We don’t really have many backup terms to switch to.

            Maybe we should look to science and start using the term pseudo-engineering to dismiss the frivolous terms. I don’t really like that though since pseudoscience has an invalidating connotation whereas eg prompt engineering is not a lesser or invalid form of engineering - it’s simply not engineering at all, and no more or less ”valid”. It’s like calling yourself a ”canine engineer” when teaching your dog to do tricks.

      • vidarh an hour ago

        The paper in question is atrocious.

        If you assume any kind of error rate of consequence, and you will get that, especially if temperature isn't zero, and at larger disk sizes you'd start to hit context limits too.

        Ask a human to repeatedly execute the Tower of Hanoi algorithm for similar number of steps and see how many will do so flawlessly.

        They didn't measure "the diminishing returns of 'deep learning'"- they measured limitations of asking a model to act as a dumb interpreter repeatedly with a parameter set that'd ensure errors over time.

        For a paper that poor to get released at all was shocking.

      • hnlmorg 8 hours ago

        Context engineering isn’t a rebranding. It’s a widening of scope.

        Like how all squares are rectangles, but not all rectangles are squares; prompt engineering is context engineering but context engineering also includes other optimisations that are not prompt engineering.

        That all said, I don’t disagree with your overall point regarding the state of AI these days. The industry is full of so much smoke and mirrors these days that it’s really hard to separate the actual novel uses of “AI” vs the bullshit.

        • bsenftner 4 hours ago

          Context engineering is the continual struggle of software engineers to explain themselves, in an industry composed of weak communicators that interrupt to argue before statements are complete, do not listen because they want to speak, and speak over one another. "How to use LLMs" is going to be argued forever simply because those arguing are simultaneously not listening.

          • hnlmorg 3 hours ago

            I really don’t think that’s a charitable interpretation.

            One thing I’ve noticed about this AI bubble is just how much people are sharing and comparing notes. So I don’t think the issue is people being too arrogant (or whatever label you’d prefer to use) to agree on a way to use.

            From what I’ve seen, the problem is more technical in nature. People have built this insanely advanced thing (LLMs) and now trying to hammer this square peg into a round hole.

            The problem is that LLMs are an incredibly big breakthrough, but they’re still incredibly dumb technology in most ways. So 99% of the applications that people use it for are just a layering of hacks.

            With an API, there’s generally only one way to call it. With a stick of RAM, there’s generally only one way to use it. But to make RAM and APIs useful, you need to call upon a whole plethora of other technologies too. With LLMs, it’s just hacks on top of hacks. And because it seemingly works, people move on before they question whether this hack will still work in a months time. Or a years time. Or a decade later. Because who cares when the technology would already be old next week anyway.

            • bsenftner 3 hours ago

              It's not a charitable opinion. It is not people being arrogant either. It's the software industry's members were not taught how to effectively communicate, and due to that the attempts by members of the industry to explain create arguments and confusion. We have people making declarations, with very little acknowledgement of prior declarations.

              LLMs are extremely subtle, they are intellectual chameleons, which is enough to break many a person's brain. They respond as one prompts them in a reflection of how they were prompted, which is so subtle it is lost on the majority. The key to them is approaching them as statistical language constructs with mirroring behavior as the mechanism they use to generate their replies.

              I am very successful with them, yet my techniques seem to trigger endless debate. I treat LLMs as method actors and they respond in character and with their expected skills and knowledge. Yet when I describe how I do this, I get unwanted emotional debate, as if I'm somehow insulting others through my methods.

              • swader999 2 hours ago

                That's interesting and a unique perspective. Like to hear more.

              • janto 2 hours ago

                Ouija boards with statistical machinery :)

      • sitkack 2 hours ago

        At this point all of Apple's AI take-down papers have serious flaws. This one has been beaten to death. Finding citations is left to the reader.

  • jes5199 10 hours ago

    good survey of what people are already implementing, but I’ve convinced we barely understand the possibility space here. There may be much more elaborate structures that we will put context into that haven’t been discovered yet

  • dmezzetti 5 hours ago

    Good retrieval/search is the foundation of context. It's definitely garbage in - garbage out here otherwise. Search is far from a solved problem.

    • rapjr9 a minute ago

      Context is a much bigger problem. For an agent to have appropriate context to offer advice it has to know many things about the specific environment and state of the person querying the agent. For example, to answer the question "what's the weather going to be like later today" the agent has to know where the person is. If they are indoors you may not be able to get that from their cellphone GPS. If they are using a proxy server you may not be able to get location from their IP address. They may have Bluetooth and WiFi turned off. They may not have a default location set or could be somewhere else. The agent also needs to know where the person is going to be "later today". They might be on a plane flying to a new location or driving or on a train. They may have just changed their plans because of a phone call they received and plan to head to a new location. The weather may be complex with a hurricane forming nearby or a storm with tornado potential may be moving through the area.

      Context is very difficult for computers to acquire and understand. In some cases it requires knowing what a person is thinking or their entire life history. The sensors currently available are very limited in their ability to gather context; for example sensing mood, human relationships, intentions, tastes, fashion, local air temperature, knowledge about building layout, customs, norms, and a lot more. Context is a huge ontology problem and it's not going to be solved any time soon. So agents are going to be limited in what they can do for a long time. At a minimum an agent probably needs to know your entire life history, and the life history of everyone you know, and the past history of the area you are in. More limited ideas of context may be useful, but context as humans understand it is immensely complex. Even if you define context as just what a person supplies to a chatbot as context, the person may not be able to supply everything relevant to the question at hand because context is difficult for people too. And everything relevant to a question is most certainly not always available on the web or in a database.

  • azaras 10 hours ago

    To provide context, I utilize the memory-bank pattern with GitHub Copilot Agent, but I believe I'm wasting a significant number of tokens.

  • truth_seeker 10 hours ago

    Nah ! I am not convinced that context engineering is better (in the long trem) than prompt engineering. Context engineering is still complex and needs maintainance. Its much lower level than human level language.

    Given that domain expertise of the problem statment, we can apply the same tactics in context engineering on higher level in prompt engineering.

    • CharlieDigital 2 hours ago

      Going to disagree here.

      Early in the game when context windows were very small (8k, 16k, and then 32k), the team I was working with achieved fantastic results with very low incidence of hallucinations through deep "context engineering" (we didn't call it that but rather "indexing and retrieval").

      We did a project for Alibaba and generated tens of thousands of pieces of output . They actually had human analysts reviews and grade each one for the first thousand. The errors they found? Always in the source material.

    • hnlmorg 7 hours ago

      This whole industry is complex and needs constant maintenance. APIs break all the time -- and that's assuming they were even correct to begin with. New models are constantly released, each with their own new quirks. People are still figuring out how to build this tech -- and as quickly as they figure one thing out, the goal posts move again.

      This entire field is basically being built on quicksand. And it will stay like this until the bubble bursts.