Google Gemini has the worst LLM API

(venki.dev)

203 points | by indigodaddy 8 months ago ago

191 comments

simonw 8 months ago

I still don't really understand what Vertex AI is.

If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.

I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...

You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.

Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai

Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk

[-]

anaisbetts 8 months ago

It's a way for you to have your AI billing under the same invoice as all of your other cloud purchases. If you're a startup this is a dumb feature, if you work at a $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers

[-]

blitzar 8 months ago

> $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers

What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.

Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.

[-]

kridsdale1 8 months ago

I work AT Google and 99% of my conversations must have been the training set for your paragraph.

[-]

blitzar 8 months ago

If they replaced the leet code interviews with department wide meetings and email chain take home tasks I could make hay and really shine with a series of No nothing from this side, FYI's and back burners.

Google sounds like a fun place to work, run it up the flagpole and see if you can move the needle before the next hard stop for me.

progbits 8 months ago

It's also useful in a startup, I just start using it with zero effort.

For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.

NoahZuniga 8 months ago

This is not true??? The AI studio surface is also billed on a per project basis?

bn-l 8 months ago

ah! thank you. I was also struggling with where vertex fitted.

tzury 8 months ago

Vertex by example:

    creds = service_account.Credentials.from_service_account_file(
        SA_FILE,
        scopes=[
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/generative-language",
        ]
    )


    google.genai.Client(
        vertexai=True,
        project=PROJECT_ID,
        location=LOCATION,
        http_options={"api_version": "v1beta1"},
        credentials=sa_creds,
    )

That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".

Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.

For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.

Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.

[-]

sitefail1 8 months ago

I don't think a service account vs an API key would improve performance in any meaningful way. I doubt the AI endpoint is authenticating the API key against a central database every request, it will most certainly be cached against a service key in the same AZ or whatever GCP call it.

ivanvanderbyl 8 months ago

Service account file vs API Key have similar security risks if provided the way you are using them. Google recommends using ADC and it’s actually an org policy recommendation to disable SA files.

[-]

wanderer2323 8 months ago

ADC (Application Default Credentials) is a specification for finding credentials (1. look here 2. look there etc.) not an alternative for credentials. Using ADC one can e.g. find an SA file.

As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.

(ref: https://cloud.google.com/iam/docs/migrate-from-service-accou...)

logankilpatrick 8 months ago

The startup credits are fully compatible with AI Studio, they are not specific to Vertex.

laborcontract 8 months ago

Google Cloud Console's billing console for Vertex is so poor. I'm trying to figure out how much i spent on which models and I still cannot for the life of me figure it out. I'm assuming the only way to do it is to use the gemini billing assistant chatbot, but that requires me to turn on another api permission.

I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.

[-]

chrisheecho 8 months ago

I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.

[-]

jeswin 8 months ago

Can you allow prepaid credits as well please?

[-]

byefruit 8 months ago

100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.

chrisheecho 8 months ago

one step ahead of you ;)

tyre 8 months ago

Gemini’s is no better. Their data can be up to 24h stale and you can’t set hard caps on API keys. The best you can do is email notification billing alerts, which they acknowledge can be hours late.

__jl__ 8 months ago

Only problem is that the genai API at https://ai.google.dev is far less reliable and can be problematic for production use cases. Right around the time Gemini 2.0 launched, it was done for days on end without any communication. They are putting a lot of effort into improving it but it's much less reliable than openai, which matters for production. They can also reject your request based on overall system load (not your individual limits), which is very unpredictable. They advertise 2000 requests per minute. When I tried several weeks ago, I couldn't even get 500 per minute.

[-]

logankilpatrick 8 months ago

Pls ping me if you run into any production issues, will raise right away to the team. We have massive at scale products operating on AI Studio, so we are set up to ensure stability.

mgraczyk 8 months ago

OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.

Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft

[-]

simonw 8 months ago

I find Google's service auth SO hard to figure out. I've been meaning to solve deploying to Cloud Run via service with for several years now but it just doesn't fit in my brain well enough for me to make the switch.

[-]

chrisheecho 8 months ago

simonw, 'Google's service auth SO hard to figure out' – absolutely hear you. We're taking this feedback on auth complexity seriously. We have a new Vertex express mode in Preview (https://cloud.google.com/vertex-ai/generative-ai/docs/start/... , not ready for primetime yet!) that you can sign up for a free tier and get API Key right away. We are improving the experience, again if you would like to give feedback, please DM me on @chrischo_pm on X.

mgraczyk 8 months ago

If you're on cloud run it should just work automatically.

For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json

If you use cloud build you shouldn't need to do anything

[-]

candiddevmike 8 months ago

You should consider setting up Workload Identity Federation and authentication to Google Cloud using your GitHub runner OIDC token. Google Cloud will "trust" the token and allow you to impersonate service accounts. No static keys!

[-]

mgraczyk 8 months ago

Does not work for many Google services, including firebase

[-]

progbits 8 months ago

Yes it does. We deploy firebase and bunch of other GCP things from github actions and there are zero API keys or JSON credentials anywhere.

Everything is service accounts and workload identity federation, with restrictions such as only letting main branch in specific repo to use it (so no problem with unreviewed PRs getting production access).

Edit: if you have a specific error or issue where this doesn't work for you, and can share the code, I can have a look.

[-]

mgraczyk 8 months ago

No thank you, there is zero benefit to migrating and no risk in using credentials the way I do.

How do you sign a firebase custom auth token with workload identity federation? How about a pre signed storage URL? Off the top of my head I think those were two things that don't work

[-]

progbits 8 months ago

First, regarding "zero benefit" and "no risk". I disagree. The risk and benefit might be low, and not worth the change for you. But it is absolutely not zero.

You have a JSON key file which you can't know how many people have. The person who created the key, downloaded it and then stored it as github secret - did they download it to /dev/shm? Did some npm/brew install script steal it from their downloads folder? Any of the github repo owners can get hold of it. Depending on whether you use github environments/deployments and have set it up properly, so can anyone with write access to the repo. Do you pin all your dependencies, reusable workflows etc, or can a compromise of someone elses repo steal your secrets?

With the workload identity auth, there is no key. Each access obtains a short lived token. Only workflows on main branch can get it. Every run will have audit logs, and so will every action taken by that token. Risk of compromise is much lower, but even more importantly, if compromised I'll be able to know exactly when and how, and what malicious actions were taken.

Maybe this is paranoid to you and not worth it. That's fine. But it's not "no risk", and it is worth to me to protect personal data of our users.

---

As for your question, first step is just to run https://github.com/google-github-actions/auth with identity provider configured in your GCP project, restricted to your github repo or org.

This will create application default credentials that most GCP tools and libraries will just work with as if when you are running things locally after "gcloud auth login".

For firebase token you can just run a python script as subsequent step in the github job doing something like https://firebase.google.com/docs/auth/admin/create-custom-to.... For signed storage url this can be done with the gcloud tool: https://cloud.google.com/storage/docs/access-control/signing...

In both cases after running the "google-github-actions/auth" step it will just work with the short-lived credentials that step generated.

[-]

8 months ago

[deleted]

PantaloonFlames 8 months ago

You could post on Reddit asking for help and someone is likely to provide answers, an explanation, probably even some code or bash commands to illustrate.

And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.

mountainriver 8 months ago

GCP auth is terrible in general. This is something aws did well

[-]

PantaloonFlames 8 months ago

I don't get that. How?

- There are principals. (users, service accounts)

- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.

- Permissions guard the things you can do in Google cloud.

- There are builtin roles that wrap up sets of permissions.

- you can create your own custom roles.

- attach roles to principals to give them parcels of permissions.

[-]

mgraczyk 8 months ago

yeah bro just one more principal bro authenticate each one with SAML or OIDC or Google Signin bro set the permissions for each one make sure your service account has permissions aiplatform.models.get and aiplatform.models.list bro or make a custom role and attach the role to the principle to parcel the permission

It's not complicated in the context of huge enterprise applications, but for most people trying to use Google's LLMs, it's much more confusing than using an API key. The parent commenter is probably using an aws secret key.

And FWIW this is basically what google encourages you to do with firebase (with the admin service account credential as a secret key).

arccy 8 months ago

GCP auth is actually one of the things it does way better than AWS. it's just that the entire industry has been trained on AWS's bad practices...

minimaxir 8 months ago

From the linked docs:

> If you want to disable thinking, you can set the reasoning effort to "none".

For other APIs, you can set the thinking tokens to 0 and that also works.

[-]

mgraczyk 8 months ago

Wow thanks I did not know

[-]

logankilpatrick 8 months ago

We added it to the docs. The downside of the OAI compat endpoint is we have to design the API twice, once for our API, then once through the OAI compat layer which makes it slower sometimes to have certain features, especially if we diverge at all.

[-]

mgraczyk 8 months ago

Thanks, yes makes sense.

BTW, I have noticed that when tested outside GCP, the OpenAI compat endpoint has significantly lower latency for most requests (vs using the genai library). VertexAI is better than both.

Any idea why or if that will change?

chrisheecho 8 months ago

We built the OpenAI Compatible API (https://cloud.google.com/vertex-ai/generative-ai/docs/multim...) layer to help customers that are already using OAI library to test out Gemini easily with basic inference but not as a replacement library for the genai sdk (https://github.com/googleapis/python-genai). We recommend using th genai SDK for working with Gemini.

[-]

mike_hearn 8 months ago

So, to be clear, Google only supports Python as a language for accessing your models? Nothing else?

[-]

chrisheecho 8 months ago

We have Python/Go in GA.

Java/JS is in preview (not ready for production) and will be GA soon!

[-]

troupo 8 months ago

What about providing an actual API people can call without needing to rely on Google SDKs?

[-]

logankilpatrick 8 months ago

you can do so with the AI SDK from Vercel, open router, etc or just sending raw http requests

logankilpatrick 8 months ago

This is documented for AI Studio here: https://ai.google.dev/gemini-api/docs/openai#thinking

Aeolun 8 months ago

When I used the openai compatible stuff my API’s just didn’t work at all. I switched back to direct HTTP calls, which seems to be the only thing that works…

franze 8 months ago

yeah, 2 days to get Google OAuth flow integrated into an background app/script, 1 day coding for the actual app ...

[-]

jpc0 8 months ago

Is this vertexAI related or in general, I find googles oauth flow to be extremely well documented and easy to setup…

jacob019 8 months ago

I got claude to write me an auth layer using only python http.client and cryptography. One shot no problem, now I can get a token from the service key any time, just have to track expiration. Annoying that they don't follow industry standard though.

arccy 8 months ago

should have used ai to write the integrations...

[-]

franze 8 months ago

thats with AI

as there are so many variations out there the AI gets majorly confused, as a matter of fact, the google oauth part is the one thing that gemini 2.5 pro cant code

should be its own benchmark

[-]

enneff 8 months ago

Maybe you should just read the docs and use the examples there. I have used all kinds of GCP services for many years and auth is not remotely complicated imo.

shresbm123 8 months ago

We support reasoning_effort = none. That will let you disable flash 2 thinking. We will document it better.

omneity 8 months ago

JSONSchema support on Google's OpenAI-compatible API is very lackluster and limiting. My biggest gripe really.

[-]

shresbm123 8 months ago

yeah we are looking into it

[-]

omneity 8 months ago

Thank you! Adding support for `additionalProperties`[0] (and perhaps `patternProperties` too) would be particularly great!

Happy to provide test cases as well if helpful.

0: https://datatracker.ietf.org/doc/html/draft-fge-json-schema-...

chrisheecho 8 months ago

simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.

For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.

However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.

[-]

troupo 8 months ago

Why create two different APIs that are the same, but only subtly different, and have several different SDKs?

[-]

chrisheecho 8 months ago

I think you are talking about generativeai vs. vertexai vs. genai sdk.

And you are watching us evolve overtime to do better.

Couple clarifications 1. Going forward we only recommend using genai SDK 2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)

[-]

troupo 8 months ago

So. Three different SDKs.

No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"

Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.

mark_l_watson 8 months ago

I think you can strongly influence which SDK your customers use by keeping the Python, Typescript, and Curl examples in the documentation up to date and uniformly use what you consider the ‘best’ SDK in the examples.

Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.

One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.

EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.

[-]

chrisheecho 8 months ago

Thanks! - I like it too :)

unknown_user_84 8 months ago

Indeed. Though the billing dashboard feels like an over engineered April fool's joke compared to Anthropic or OpenAI. And it takes too long to update with usage. I understand they tacked it into GCP, but if they're making those devs work 60 hours a week can we get a nicer, and real time, dashboard out of it at least?

[-]

logankilpatrick 8 months ago

we will have a dashboard in AI Studio very soon! Then will work to drive down delay.

coredog64 8 months ago

Wait until you see how to check Bedrock usage in AWS.

(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)

jacob019 8 months ago

Except that the OpenAI compatible endpoint isn't actually compatible. Doesn't support string enum values for function calls and throws a confusing error. Vertex at least has better error messages. My solution, just use text completions and emulate the tool call support client side, validate the responses against the schema, and retry on failure. It rarely has to retry and always works the 2nd time even without feedback.

[-]

ashu1461 8 months ago

There is also no way to over-write content moderation settings, and half of the responses you generate via open ai endpoint end up being moderated.

fzysingularity 8 months ago

Vertex AI is essentially equivalent to Azure OpenAI - enterprise-ready, with HIPAA/SOC2 compliance and data-privacy guarantees.

FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.

minimaxir 8 months ago

Vertex AI is essentially a rebranding of their more enterprise platform on GCP, nothing explicitly "new."

ashu1461 8 months ago

Have to work hard to figure out the difference between

- Vertex AI

- AI Studio

- Gemini

- Firebase Gen AI

hustwindmaple1 8 months ago

If you are not a paying GCP user, there is really no point to even look at Vertex AI.

Just stick with AI Studio and the free developer AI along with it; you will be much much happier.

egamirorrim 8 months ago

I use Vertex because that's the one that makes enterprise security people happy about how our datas handled.

Do Google use all the AI studio traffic to train etc?

[-]

sunaookami 8 months ago

Not if you have billing enabled: https://ai.google.dev/gemini-api/docs/pricing

[-]

logankilpatrick 8 months ago

This is correct, "When you activate a Cloud Billing account, all use of Gemini API and Google AI Studio is a "Paid Service" with respect to how Google Uses Your Data, even when using Services that are offered free of charge, such as Google AI Studio and unpaid quota of Gemini API."

kmod 8 months ago

There are a few conditions that take precedence over having-billing-enabled and will cause AI Studio to train on your data. This is why I personally use Vertex

KTibow 8 months ago

Vertex is the enterprise platform. It also happens to have much higher rate limits, even for free models.

chrisheecho 8 months ago

Hey there, I’m Chris Cho (x: chrischo_pm, Vertex PM focusing on DevEx) and Ivan Nardini (x: ivnardini, DevRel). We heard you and let us answer your questions directly as possible.

First of all, thank you for your sentiment for our latest 2.5 Gemini model. We are so glad that you find the models useful! We really appreciate this thread and everyone for the feedback on Gemini/Vertex

We read through all your comments. And YES, – clearly, we've got some friction in the DevEx. This stuff is super valuable, helps me to prioritize. Our goal is to listen, gather your insights, offer clarity, and point to potential solutions or workarounds.

I’m going to respond to some of the comments given here directly on the thread

[-]

ctxc 8 months ago

Had to move away from Gemini because the SDK just didn't work.

Regardless of if I passed a role or not, the function would say something to the effect of "invalid role, accepted are user and model".

Tried switching to openAI compatible SDK, it threw errors for tool call calls and I just gave up.

Could you confirm if it was a known bug that was fixed?

[-]

ctxc 8 months ago

The error fyr https://x.com/dvsj_in/status/1895522286297567369?t=qYLx3kchj...

[-]

chrisheecho 8 months ago

You don't have to specify role when you call through Python (https://cloud.google.com/vertex-ai/generative-ai/docs/start/...)

(which I think is what you are using but maybe i'm wrong).

Feel free to DM me on @chrischo_pm on X. Stuff that you are describing shouldn't happen

Deathmax 8 months ago

Can we avoid weekend changes to the API? I know it's all non-GA, but having `includeThoughts` suddenly work at ~10AM UTC on a Sunday and the raw thoughts being returned after they were removed is nice, but disruptive.

[-]

chrisheecho 8 months ago

Can you tell me the exact instance when this happened please? I will take this feedback back to my colleagues. But in order to change how we behave I need a baseline and data

[-]

Deathmax 8 months ago

Thoughts used to be available in the Gemini/Vertex APIs when Gemini 2.0 Flash Thinking Experimental was initially introduced [1][2], and subsequently disabled to the public (I assume hidden behind a visibility flag) shortly after DeepSeek R1's release [3] regardless of the `include_thoughts` setting.

At ~10:15AM UTC 04 May, a change was rolled out to the Vertex API (but not the Gemini API) that caused the API to respect the `include_thoughts` setting and return the thoughts. For consumers that don't handle the thoughts correctly and had specified `include_thoughts = true`, the thinking traces then leaked into responses.

[1]: https://googleapis.github.io/python-genai/genai.html#genai.t...

[2]: https://ai.google.dev/api/generate-content#ThinkingConfig

[3]: https://github.com/googleapis/python-genai/blob/157b16b8df40...

jbellis 8 months ago

Can you ask whoever owns dashboards to make it so I can troubleshoot quota exceeded errors like this? https://x.com/spyced/status/1917635135840858157

[-]

logankilpatrick 8 months ago

We are working on fixing this and showing the critical ones in AIS. I agree it is crazy there is 700+ items here. Real pain in the neck to deal with.

egamirorrim 8 months ago

I love that you're responding on HN, thanks for that! While you're here I don't suppose you can tell me when Gemini 2.5 Pro is hitting European regions on Vertex? My org forbids me from using it until then.

[-]

m3adow 8 months ago

Yeah, not having clear time lines for new releases on the one hand, but being quick with deprecation of older models isn't a very good experience.

froggertoaster 8 months ago

Thanks for replying, and I can safely say that most of us just want first-class conformity with OpenAI's API without JSON schema weirdness (not using refs, for instance) baked in.

[-]

troupo 8 months ago

Or returning null for null values, not some "undefined" string.

Or not failing when passing `additionalProperties: false`

Or..

irthomasthomas 8 months ago

Hi, one thing I am really struggling with in AI studio API is stop_sequences. I know how to request them, but cannot see how to determine which stop_sequence was triggered. They don't show up in the stop_reason like most other APIs. Is that something which vertex API can do? I've built some automation tools around stop_sequences, using them for control logic, but I can't use Gemini as the controller without a lot of brittle parsing logic.

[-]

shresbm123 8 months ago

Thank you feedback noted

troupo 8 months ago

Is there an undocumented hardcoded timeout for Gemini responses even in streaming mode? JSON output according to a schema can get quite lengthy, and I can't seem to get all of it for some inputs because Gemini seemingly terminates requests

[-]

NoahZuniga 8 months ago

This is probably just you hitting the model's internal output length maximum. Its 65,536 tokens for 2.5 pro and flash.

For other models, see this link and open up the collapsed section for your specific model: https://ai.google.dev/gemini-api/docs/models

[-]

troupo 8 months ago

Thanks! It might just be that!

8 months ago

[deleted]

moralestapia 8 months ago

This is so cringe.

I hope it doesn't become a trend on this site.

[-]

thebytefairy 8 months ago

A team taking the opportunity to engage directly with their users to understand their feedback so they can improve the product? So cringe.

lern_too_spel 8 months ago

Google usually doesn't care what users say at all. This is why they so often have product-crippling bugs and missing features. At least this guy is making a show of trying before he transfers to another project.

tgv 8 months ago

It’s the US style, which has made its way across the pond too: you have to make upbeat noises to remove any suspicion you’re criticizing.

[-]

moralestapia 8 months ago

Unlike others ... you got it.

It is incredibily lame for a gargantuan company like Google and their thousands of developers and PMs and this and that ... to come to a remote corner of the web to pretend they are doing what they should have done 10 years ago.

[-]

creatonez 8 months ago

Google should have cleaned up its Gemini API 10 years ago?

[-]

moralestapia 8 months ago

>Chat, briefly, what does a PM at a company like Google do?

"A Product Manager (PM) at Google is responsible for guiding the development of products from conception to launch. They identify user needs, define product vision and strategy, prioritize features, work with cross-functional teams (engineering, design, marketing), and ensure the product aligns with business goals. They act as the bridge between technical teams and stakeholders to deliver successful, user-focused solutions."

Some might have ignored your question, but in the spirit of good conversation, I figured I’d share a quick explanation of what a PM does, just in case it helps!

[-]

chrisheecho 8 months ago

This sounds accurate. I see myself as a Pain Manager more than a Product manager. Product just solves the pain that users have ;)

Sometimes we get it right the first time we launch it, I think most of the time we get it right over a period of time.

Trying to do a little bit better everyday and ship as fast as possible!

asadm 8 months ago

I don't get the outrage. Just use their OpenAI endpoints: https://ai.google.dev/gemini-api/docs/openai

It's the best model out there.

[-]

ramoz 8 months ago

I have no issues with their native structured outputs either. Other than confusing and partially incomplete documentation.

[-]

chrisheecho 8 months ago

Ramoz, good to hear that native Structured Outputs are working! But if the docs are 'confusing and partially incomplete,' that’s not a good DevEx. Good docs are non-negotiable. We are in the process of revamping the whole documentation site. Stay tuned, you will see something better than what we have today.

[-]

ramoz 8 months ago

Product idea for structured outputs: Dynamic Json field... like imagine if I want a custom schema generated (e.g. for new on-the-fly structured outputs).

[-]

chrisheecho 8 months ago

ooh i like!

malshe 8 months ago

Thanks for sharing this. I did not know this existed

rafram 8 months ago

Site seems to be down - I can’t get the article to load - but by far the most maddening part of Vertex AI is the way it deals with multimodal inputs. You can’t just attach an image to your request. You have to use their file manager to upload the file, then make sure it gets deleted once you’re done.

That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!

It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.

[-]

chrisheecho 8 months ago

That’s correct! You can send images through uploading either the Files API from Gemini API or Google Cloud Storage (GCS) bucket reference. What we DON’T have a sample on is sending images through bytes. Here is a screenshot of the code sample from the “Get Code” function in the Vertex AI studio. https://drive.google.com/file/d/1rQRyS4ztJmVgL2ZW35NXY0TW-S0... Let me create a feature request to get these samples in our docs because I could not find a sample too. Fixing it

Deathmax 8 months ago

> You can’t just attach an image to your request.

You can? Google limits HTTP requests to 20MB, but both the Gemini API and Vertex AI API support embedded base64-encoded files and public URLs. The Gemini API supports attaching files that are uploaded to their Files API, and the Vertex AI API supports files uploaded to Google Cloud Storage.

[-]

rafram 8 months ago

Their JavaScript library didn’t support that as of whenever I tried.

[-]

simonw 8 months ago

I got their most recent JavaScript API library to work for images here: https://tools.simonwillison.net/gemini-mask

Here's the code: https://github.com/simonw/tools/blob/main/gemini-mask.html

mofunnyman 8 months ago

Semi hugged.

ryao 8 months ago

I have not pushed my local commits to GitHub lately (and probably should), but my experience with the Gemini API so far has been relatively positive:

https://github.com/ryao/gemini-chat

The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.

Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.

[-]

yorick 8 months ago

It looks like you can use the gemma tokenizer to count tokens up to at least the 1.5 models. The docs claim that there's a local compute_tokens function in google-genai, but it looks like it just does an API call.

Example for 1.5:

https://github.com/googleapis/python-aiplatform/blob/main/ve...

lemming 8 months ago

Additionally, there's no OpenAPI spec, so you have to generate one from their protobuf specs if you want to use that to generate a client model. Their protobuf specs live in a repo at https://github.com/googleapis/googleapis/tree/master/google/.... Now you might think that v1 would be the latest there, but you would be wrong - everyone uses v1beta (not v1, not v1alpha, not v1beta3) for reasons that are completely unclear. Additionally, this repo is frequently not up to date with the actual API (it took them ages to get the new thinking config added, for example, and their usage fields were out of date for the longest time). It's really frustrating.

[-]

chrisheecho 8 months ago

lemming, this is super helpful, thank you. We provide the genai SDK (https://github.com/googleapis/python-genai) to reduce the learning curve in 4 languages (GA: Python, Go Preview: Node.JS, Java). The SDK works for all Gemini APIs provided by Google AI Studio (https://ai.google.dev/) and Vertex AI.

[-]

egamirorrim 8 months ago

The way dependency resolution works in Java with the special, Google only, giant dynamic BOM resolver is hell on earth.

We have to write code that round robins every region on retries to get past how overloaded/poorly managed vertex is (we're not hitting our quotas) and yes that's even with retry settings on the SDK.

Read timeouts aren't configurable on the Vertex SDK.

ezekiel68 8 months ago

Eh, you know. "Move fast and break things."

[-]

caturopath 8 months ago

I'm not sure "move fast" describes the situation.

[-]

ezekiel68 8 months ago

Hmm, the proliferation of branches, including some which seem perhaps more recent than "v1beta" made me imagine this could apply.

fumeux_fume 8 months ago

I’m sorry have you used Azure? I’ve worked with all the major cloud providers and Google has its warts, but pales in comparison to the hoops Azure make you jump through to make a simple API call.

[-]

ic_fly2 8 months ago

Azure API for LLM changes depending on what datacenter you are calling. It is bonkers. In fact it is so bad that at work we are hosting our own LLMs on azure GPU machines rather than use their API. (Which means we only have small models at much higher cost…)

jauntywundrkind 8 months ago

In general, it's just wild to see Google squander such an intense lead.

In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.

It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.

I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.

[-]

simonw 8 months ago

With Gemini 2.5 (both Pro and Flash) Google have regained so much of that lost ground. Those are by far the best long-context models right now, extremely competitively priced and they have features like image mask segmentation that aren't available from other models yet: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...

[-]

jasonfarnon 8 months ago

I think the commenter was saying google squandered its lead ("goodwill" is how I would refer to it) in providing open and interoperable services, not the more recent lead it squandered in AI. I agree with your point that they've made up a lot of that ground with gemini 2.5.

[-]

simonw 8 months ago

Yeah you're right, I should have read their comment more closely.

Google's API's have a way steeper learning curve than is necessary. So many of their APIs depend on complex client libraries or technologies like GRPC that aren't used much outside of Google.

Their permission model is diabolically complex to figure out too - same vibes as AWS, Google even used the same IAM acronym.

[-]

PantaloonFlames 8 months ago

> So many of their APIs depend on complex client libraries or technologies like GRPC that aren't used much outside of Google.

I don't see that dependency. With ANY of the APIs. They're all documented. I invoke them directly from within emacs . OR you can curl them. I almost never use the wrapper libraries.

I agree with your point that the client libraries are large and complicated, for my tastes. But there's no inherent dependency of the API on the library. The dependency arrow points the other direction. The libraries are optional; and in my experience, you can find 3p libraries that are thinner and more targeted if you like.

paul-tharun 8 months ago

I sometimes feel the complexity is present by design to increase the switching cost. Once you understand it and set it up on a project, you are locked in, as the perceived cost of moving is too high.

xyzzy_plugh 8 months ago

This is bizarre to read. gRPC is used _widely_ outside Google. I'm not aware of any API that requires you to use gRPC. I've never found their permission model to be complicated at all, at least compared to AWS.

Aeolun 8 months ago

I feel like the AWS model isn’t all that hard for most of their API’s. It’s just something you don’t really want to think about.

tyre 8 months ago

Gemini 2.5 Pro is so good. I’ve found that using it as the architect and orchestrator, then farming subtasks and computer use to sonnet, is the best ROI

[-]

PantaloonFlames 8 months ago

You can also farm out subtasks to the Gemini Flash models. For example using Aider, use Pro for the "strong" model and Flash for the weak model.

[-]

tyre 8 months ago

I’ve tried flash. Sonnet with computer use has been a blast. Limited anecdata led me to 2.5 Pro + 3.7 Sonnet, but this all moves so fast that it good to reevaluate regularly.

egamirorrim 8 months ago

OOI what's your preferred framework for that managing agent/child agents setup?

[-]

tyre 8 months ago

I use Roo Code. It’s very good.

candiddevmike 8 months ago

The models are great but the quotas are a real pain in the ass. You will be fighting other customers for capacity if you end up needing to scale. If you have serious Gemini usage in mind, you almost have to have a Google Cloud TAM to advocate for your usage and quotas.

[-]

chrisheecho 8 months ago

We have moved our quota system to Dynamic Shared Quota (https://cloud.google.com/vertex-ai/generative-ai/docs/quotas) for 2.0+ models. There are no quotas in DSQ. If you need a guaranteed throughput there is an option to purchase Provisioned Throughput (https://cloud.google.com/vertex-ai/generative-ai/docs/provis...).

[-]

dist-epoch 8 months ago

While we are talking about quotas, can you maybe add an easy way of checking how much you've used/got left?

Apparently now you need to use google-cloud-quotas to get the limit and google-cloud-monitoring to get the usage.

VS Code copilot managed to implement the first part, getting the limit using gemini-2.5-pro, but when I asked gemini to implement the second part it said that integrating cloud-monitoring is too complex and it can't do it !!!!

egamirorrim 8 months ago

The thing is that the entry level of provisioned throughput is so high! I just want a reliable model experience for my small Dev team using models through Vertex but I don't think there's anything I can buy there to ensure it.

harlysparks 8 months ago

Google's headcount (and internal red tap) grew significantly from 2012 to 2025. You're highlighting the fact that at some point in its massive growth, Google had to stop relentlessly pushing R&D and allocate leadership focus on addressing technical debt (or at least operational efficiency) that was a consequence of that growth.

caturopath 8 months ago

I don't understand why Sundar Pichai hasn't been replaced. Google seems like it's been floundering with respect to its ability to innovate and execute in the past decade. To the extent that this Google has been a good maintenance org for their cash cows, even that might not be a good plan if they dropped the ball with AI.

[-]

harlysparks 8 months ago

Perhaps you need to first define "innovation" and maybe also rationalize why that view of innovation is the end-all of determining CEO performance. Otherwise you're begging the question here.

Google's stock performance, revenue growth, and political influence in Washington under his leadership has grown substantially. I don't disagree that there are even better CEO's out there, but as an investor, the framing of your question is way off. Given the financial performance, why would you want to replace him?

[-]

caturopath 8 months ago

I didn't say that innovation was the end-all of determining CEO performance, though producing new products and creating new markets is the angle that tech tends to go for. I mentioned Google's struggles to execute: they have an astoundingly hard time getting shit done compared to the other largest tech companies.

The counterfactual isn't Google having average performance. You're crediting the stock performance, revenue growth, and political influence (don't really agree this last one was a place Google shined over this period) to Sundar's leadership; I think it has a lot more to do with the company he was handed.

rs186 8 months ago

Answer is simple: he keeps cash coming in and stock price rising. You can compare his performance to his predecessors and CEOs at other companies. That does not necessarily make him a "good" leader in your eyes, but good enough to the board.

huntertwo 8 months ago

Everybody’s thinking the same thing. He sucks.

shawabawa3 8 months ago

Google is the leader in LLMs and self-driving cars, two of the biggest innovation areas in the last decade, so how exactly has it been floundering in its ability to innovate and execute?

[-]

caturopath 8 months ago

Google isn't "the leader" in LLMs. Despite a huge funnel to get users in, for intentional use they are a distant second place for consumers, fourth place for LLM APIs, and reputationally treated as an underdog to two tiny companies.

HDThoreaun 8 months ago

googles worth 2 trillion dollars off the back of a website. I think investors are so out of their depth with tech that theyre cool with his mediocre performance

[-]

caturopath 8 months ago

Two websites and an ad business.

aaronbrethorst 8 months ago

Hubris. It seems similar, at least externally, to what happened at Microsoft in the late 90s/early 00s. I am convinced that a split-up of Microsoft would have been invigorating for the spin-offs, and the tech industry in general would have been better for it.

Maybe we’ll get a do-over with Google.

sawyna 8 months ago

My personal daily experience with this! I first used vertexai APIs because that's what they suggested, that Gemini APIs are not for production use.

Then there comes the Google.generativeai. I don't remember the reason but they were pushing me to start using this library.

Now it's all flashy google.genai libraries that they are pushing!

I have figured that this is what I should use and this is the documentation that I should look for, because doing a Google search or using an LLM gives me so many confusing results. The only thing that works for sure is reading the library code. That's what I'm doing these days.

For example, the documentation in one of those above libraries say that Gemini can read a document from cloud storage if you give it the uri. That doesn't work in google.genai library. I couldn't figure out why. I imagined maybe Gemini might need access to the cloud storage bucket, but I couldn't find any documentation as to how I can do that. I finally understood that I need to use the new file API and that uri works.

Yes, I like Gemini model they are really good. But the library documentation can be significantly simpler.

msp26 8 months ago

The linked blog is down. But agreed, I would especially like to see this particular thing fixed.

> Property ordering

> When you're working with JSON schemas in the Gemini API, the order of properties is important. By default, the API orders properties alphabetically and does not preserve the order in which the properties are defined (although the Google Gen Al SDKs may preserve this order). If you're providing examples to the model with a schema configured, and the property ordering of the examples is not consistent with the property ordering of the schema, the output could be rambling or unexpected.

SmellTheGlove 8 months ago

Google’s APIs are all kind of challenging to ramp up on. I’m not sure if it’s the API itself or the docs just feeling really fragmented. It’s hard to find what you’re looking for even if you use their own search engine.

[-]

PantaloonFlames 8 months ago

The problem I've had is not that the APIs are complicated but that there are so darn many of them.

I agree the API docs are not high on the usability scale. No examples, just reference information with pointers to types, which embed other types, which use abstract descriptions. Figuring out what sort of json payload you need to send, can take...a bunch of effort.

candiddevmike 8 months ago

The Google Cloud API library is meant to be pretty dead simple. While there are bugs, there's a good chance if something's not working it's because of overthinking or providing too many args. Alternatively, doing more advanced stuff and straying from the happy path may lead to dragons.

arccy 8 months ago

they're usually pretty well structured and actually follow design principles like https://cloud.google.com/apis/design and https://google.aip.dev/1

once it clicks, it's infinitely better than the AWS style GetAnythingGoes apis....

miki123211 8 months ago

TBH, my biggest gripe with Google is that they seem to support a slightly different JSON schema format for structured outputs than everybody else. Where Open AI encourages (or even forces) you to use refs for embedding one object in another, Google wants you to embed directly, which is not only wasteful but incompatible with how libraries that abstract over model providers do it.

My structured output code (which uses litellm under the hood, which converts from Pydantic models to JSON schemas), does not work with Google's models for that reason.

[-]

intalentive 8 months ago

I used Gemini to write a function that recursively resolves all the refs. Not a big deal to convert your pydantic schemas.

kmod 8 months ago

The worst part to me is the privacy nightmare with AI Studio. It's essentially impossible to tell whether any particular API call will end up being included in their training data since this depends on properties that are stored elsewhere and are not available to the developer -- even a simple property such as "does this account have billing enabled" is oddly difficult to evaluate, and I was told by their support that because I at one point had any free credits on my account that it was a trial account and not a billed account even though I had a credit card attached and was being charged. I don't know if this is true and there is no way for me to find out.

At some point they updated their privacy policy in regards to this, but instead of saying that this will cause them to train on your data, now the privacy policy says both that they will train on this data and that they will not train on this data, with no indication of which statement takes precedence over the other.

[-]

shresbm123 8 months ago

Unless you are in the free tier of the API we do not train on your data. But let us make it clearer in the policy. If you would like to get more clarity on terms please DM me at @shresbm on X

tom_m 8 months ago

Doesn't matter much, Google already won the AI race. They had all the eyeballs already. There's a huge reason why they are getting slapped with anti-trust right now. The other companies aren't happy.

I agree though, their marketing and product positioning is super confusing and weird. They are running their AI business in a very very very strange way. This has created a delay, I don't think opportunity for others, in their dominance in this space.

Using Gemini inside BigQuery (this is via Vertex) is such a stupid good solution. Along with all of the other products that support BigQuery (datastream from cloudsql MySQL/postgres, dataform for query aggregation and transformation jobs, BigQuery functions, etc.), there's an absolutely insane amount of power to bring data over to Gemini and back out.

It's literally impossible for OpenAI to compete because Google has all of the other ingredients here already and again, the user base.

I'm surprised AWS didn't come out stronger here, weird.

[-]

tom_m 8 months ago

Oh and it's not just Gemini, I'm sorry. It's Vertex. So it's other models as well. Those you train too.

ashu1461 8 months ago

I don't think so, it might be true for their long tail customers, but most of the tech folks have not used Google Search / Gemini APIs in ages.

harlysparks 8 months ago

They are getting slapped with anti-trust right now because they gave $10M to Kamala Harris and only $1M to Trump.

That's it.

Also, the AI race is a red queen race. There is no line on the sand that says "you are the ultimate winner", that's not how time works. And given that the vast majority of the internet is on AWS, NOT GCP, and that Gemini isn't even the most popular LLM among AI developers, I'm not sure you can even say that Google is the leader at this exact point in time.

[-]

tom_m 8 months ago

Yea, but Microsoft was slapped with anti-trust before too. These companies shrug it off.

Yea, I guess it's not really much of a race. Because they are competing for something Google already largely has.

In a few years, literally everyone is going to have the same AI and it'll become cheaper and cheaper to operate. I mean already, if you look at those leaderboards, many models are more or less the same.

Training data (legal data) is running out. The algorithms are pushing their limits. It's not unlimited. It's not the second coming of Jesus as some people believe. It's a commodity and everyone is going to have it.

So fast forward a little and think about that. The companies in the better position, the "winners," are going to be the same they always are. Those with the bigger user base and those with more data. That's Google. Yes, AWS too. Yes, Microsoft.

The reason I'm bullish on Google is I suppose not because of Gemini itself. It's because of a super power like Google having their own LLM that performs as well as any other.

I also believe OpenAI will go out of business or be acquired. They way way overspent, but they were also pioneers here a bit so they took on that burden. They over raised. That's painfully obvious after DeepSeek. They lost most of their top talent. They are being propped up because there's a LOT of money at stake. Not just for OpenAI but for other companies too. They are the face of AI so if they went belly up tomorrow then it'd start a wave of panic selling. People would think AI failed. So until some other successor steps in (Meta, Microsoft, Google, etc.), everyone including competitors will keep OpenAI on life support if need be. I don't think it helps that they foolishly bought Windsurf for $3B when you go and look at Roo Code hanging out there for free.

djohnston 8 months ago

Google APIs have always been difficult for me to use. I find their documentation super fragmented, and they always build the same thing twice with awkward overlaps. In this case, it's Vertex and GenAI - two different SDKs with isomorphic data structures that are seemingly incompatible, but only sometimes. I don't understand how these things happen - but as usual I blame PMs trying to mark their territory and pissing all over everything.

The new gemini models themselves though, are killer. The confusion is a small price to pay.

[-]

harlysparks 8 months ago

The duplicative functions and overlapping features is a product of how their perf annual review cycle works and the way you have to play the game to get promotions as a SWE there. I think they've changed the setup in the last few years, but historically Google rewards building lots of beta versions of stuff with some commercial potential, throwing it over a wall and seeing what sticks.

shresbm123 8 months ago

Haha as one of those PMs - thank you for appreciating the models. We do have a unified SDK and are working on making the docs more comprehensive and clearer. But do let keep giving us specific and candid feedback

smel 8 months ago

This happen when you have 200k employees competing internally more than with outside competition.

Havoc 8 months ago

Definitely designed by multiple teams with no coordination.

The very generous free tier is pretty much the only reason I'm using it at all

behnamoh 8 months ago

Even their OAI-compatible API isn't fully compatible. Tools like Instructor have special-casing for Gemini...

mattw1810 8 months ago

Their patchy JSON schema support for tool calls & structured generation is also very annoying… things like unions that you’d think are table stakes (and in fact work fine with both OpenAI and Anthropic) get rejected & you have to go reengineer your entire setup to accommodate it.

kaycey2022 8 months ago

The page 404s now. I wonder what was said. :(

[-]

zodiakzz 8 months ago

https://archive.ph/20250504014835/https://venki.dev/notes/go...

stoicfungi 8 months ago

Multiple SDKs, and the documentation and API responses are not consistent. Today I've spent hours just to make MCP & function calling work. It is really painful to work with.

humerahoneya123 8 months ago

It’s wild how Gemini is leading the pack on model capabilities—context length, fine-tuning, multimodality—and yet using it feels like you’re solving a UX puzzle just to ship a basic feature. It’s almost ironic: Google builds some of the smartest models out there, but makes devs feel dumb trying to integrate them. Imagine the adoption surge if the experience was even half as smooth as OpenAI’s. Power without usability is just potential, not progress.

bundie 8 months ago

Why is the page 404-ing?

8 months ago

[deleted]

Squarex 8 months ago

The link is not working anymore. Maybe the article has been deleted?

Max-Limelihood 8 months ago

I'mma be honest, I think the API is great—EXCEPT THEY STILL DON'T HAVE LOGPROBS. ARGH.

simianwords 8 months ago

Am I the only one who prefers a more serious approach to prefix caching? It is a powerful tool and having an endpoint dedicated to it and being able to control TTL's using parameters seems like the best approach.

On the other hand the first two approaches from OpenAI and Anthropic are frankly bad. Automatically detecting what should be prefix cached? Yuck! And I can't even set my own TTL's in Anthropic API (feel free to correct me - a quick search revealed this).

Serious features require serious approaches.

[-]

simonw 8 months ago

> Automatically detecting what should be prefix cached? Yuck!

Why don't you like that? I absolutely love it.

[-]

simianwords 8 months ago

I mean't that this is the only way to control prefix caching. I consider this a serious feature - if I were to make an application using prefix caching I would not consider OpenAI at all. I can't control what gets cached and for how long.

Wouldn't you want to give more power to the developer? Prefix caching seems like an important enough concept to leak to the end user.

[-]

simonw 8 months ago

Gemini's approach to prefix caching requires me to pay per hour for keeping the cache populated. I have to do pretty sophisticated price modeling and load prediction to use that effectively.

Anthropic require me to add explicit cache breakpoints to my prompts, which charge for writes to the cache. If I get that wrong it can be more expensive than if I left caching turned off entirely.

With OpenAI I don't have to do any planning or optimistic guessing at all: if my app gets a spike in traffic the caching kicks in automatically and saves me money.

[-]

simianwords 8 months ago

that's fair - i have some app ideas for which i would like control over prefix caching. for example you may want to prompt cache entire chunks of enterprise data that don't change too often. the whole RAG application would be built over this concept - paying per hour for caching is sensible here.

>With OpenAI I don't have to do any planning or optimistic guessing at all: if my app gets a spike in traffic the caching kicks in automatically and saves me money.

i think these are completely different use cases. is this not different just from having a redis sitting in front of the LLM provider?

fundamentally i feel like prompt caching is something i want to control and not have happen automatically; i want to use information i have over my (future) access patterns to save costs. for instance i might prompt cache a whole PDF and ask multiple questions. if i choose to prompt cache the PDF, i can save a non trivial amount of tokens processed. how can OpenAI's automatic approach help me here?

bionhoward 8 months ago

Also has the same customer noncompete copy pasted from ClosedAI. Not that anyone seemingly cares about the risk of lawsuits from Google for using Gemini in a way that happens to compete with random-Gemini-tentacle-123

franze 8 months ago

yeah, also grounding with Google in Google 2.5 Pro does not

... deliver any URLs back, just the domains from where it grounded it response

it should return vertexai urls that redirect to the sources, but doesn't do it in all cases (in non of mine) according to the docs

plus you mandatory need to display an HTML fragment with search links that you are not allowed to edit

basically a corporate infight as an API