Fara-7B: An efficient agentic model for computer use

(github.com)

53 points | by maxloh 6 hours ago ago

18 comments

  • A4ET8a8uTh0_v2 an hour ago

    Looking at the table, I will admit that I don't get most of the use cases ( maybe with exception of comparison shopping ( gather info ), but are people really 'outsourcing' shopping? Am I really that much outside what 'normal' consumers do these days?

    Task Segment Tasks SoM GPT-4o-0513 SoM o3-mini SoM GPT-4o GLM-4.1V-9B OAI Comp-Use UI-TARS-1.5 Fara-7B Single-Site Tasks Shopping 56 62.5 71.4 38.1 31.0 42.3 41.1 52.4 Flights 51 60.1 39.2 11.1 10.5 17.6 10.5 37.9 Hotels 52 68.6 56.4 31.4 19.9 26.9 35.3 53.8 Restaurants 52 67.9 59.6 47.4 32.1 35.9 22.4 47.4 Activities 80 70.4 62.9 41.7 26.3 30.4 9.6 36.3 Ticketing 57 58.5 56.7 37.4 35.7 49.7 30.4 38.6 Real Estate 48 34.0 17.4 20.1 16.0 9.0 9.7 23.6 Jobs/Careers 50 49.3 44.0 32.7 22.7 20.7 20.7 28.0 Multi-Step Tasks Shopping List (2 items) 51 66.0 62.7 17.0 7.8 34.0 20.9 49.0 Comparison Shopping 57 67.3 59.1 27.5 22.8 1.2 8.8 32.7 Compositional Tasks 55 51.5 39.4 26.7 17.0 10.3 9.1 23.0 Overall

    • doug_durham 12 minutes ago

      I can't imagine having an AI agent book anything our purchase anything in the same way that I wouldn't have someone I don't know personally do that for me. It should do the research and take me to the place where I need to take over.

  • sreejithr 18 minutes ago

    Its just Qwen2.5-VL with a sticker on it. Chinese are leading now!

  • pogue 39 minutes ago

    Why does Microsoft keep releasing models trained on synthetic data? Is it possible their contract with OpenAI won't let them do anything else?

    I would think Microsoft, of all companies, would want to be working on their own LLM behind the scenes, even if they're relying on OpenAI for the bulk of their work.

    Meta seems to be the only US company releasing big 'open source' models, while Chinese companies continue to release many completely open source LLMs.

  • stan_kirdey 2 hours ago

    * fine tuned Qwen-7B

    • PhilippGille 43 minutes ago

      Qwen2.5-VL-7B to be precise. It's a relevant difference.

    • donbox 2 hours ago

      So.. the tables are really turning?

  • codezero 2 hours ago

    Are there any agentic models like this that would work for controlling input in arbitrary video games? I've been wanting to have an AI play Kerbal Space Program because I think it would just be pretty hilarious.

  • maartenh 2 hours ago

    How much VRAM would this require, if I would want to run this locally?

    I bought a 12GB Nvidia card a year ago. In general I'm having a hard time to find the actual required hardware specs for any self hosted AI model. Any tips/suggestions/recommended resources for that?

    • nsingh2 2 hours ago

      One quick way to estimate a lower bound is to take the number of parameters and multiply it with the bits per parameter. So a model with 7 billion parameters running with float8 types would be ~7 GB to load at a minimum. The attention mechanism would require more on top of that, and depends on the size of the context window.

      You'll also need to load inputs (images in this case) onto the GPU memory, and that depends on the image resolution and batch size.

    • daemonologist an hour ago

      12GB will be sufficient to run a quantized version, provided you're not running anything else memory-hungry on the GPU.

      You're not finding hardware specs because there are a lot of variables at play - the degree to which the weights are quantized, how much space you want to set aside for the KV cache, extra memory needed for multimodal features, etc.

      My rule of thumb is 1 byte per parameter to be comfortable (running a quantization with somewhere between 4.5 and 6 bits per parameter and leaving some room for the cache and extras), so 7 GB for 7 billion parameters. If you need a really large context you'll need more; if you want to push it you can get away with a little less.

    • selcuka an hour ago

      I use LMStudio for running models locally (macOS) and it tries to estimate whether the model would fit in my GPU memory (which is the same thing as main memory for Macs).

      The Q4_K_S quantized version of Microsoft Fara 7B is a 5.8GB download. I'm pretty sure it would work on a 12GB Nvidia card. Even the Q8 one (9.5GB) could work.

  • ghrjfjfnnfn 35 minutes ago

    Forgive me if I can't keep up with the latest AI bubble mania buzzwords, but what is "agentic" even supposed to mean? As far as I can tell it doesn't have a precise definition, and doesn't even sound like proper English.

    • doug_durham 11 minutes ago

      Ask your favorite LLM. It will tell you.

    • hsaliak 26 minutes ago

      it means you can make it do stuff (run preconfigured programs) for you, and not just chat with you