Not open source. Even if we accept model weights as source code, which is highly dubious, this clearly violates clauses 5 and 6 of the Open Source Definition. It discriminates between users (clause 5) by refusing to grant any rights to users in the European Union, and it discriminates between uses (clause 6) by requiring agreement to an Acceptable Use Policy.
EDIT: The HN title was changed, which previously made the claim. But as HN user swyx pointed out, Tencent is also claiming this is open source, e.g.: "The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry".
How's that NYTimes vs OpenAI lawsuit going? Last I can find is things are hung up in discovery: OpenAI has requested potentially a century of NYTimes reporters' notes.
> The AI company asked Judge Sidney H. Stein of the US District Court for the Southern District of New York to step in and compel the Times to produce reporters’ notes, interview memos, and other materials for each of the roughly 10 million contested articles the publication alleges were illegally plugged into the company’s AI models. OpenAI said it needs the material to suss out the copyrightability of the articles. The Times quickly fired back, calling the request absurd.
Can any lawyer on here defend OpenAI's request? Or is the article not characterizing it well in the quote?
Model weights could be treated the same way phone books, encyclopedias, and other collections of data are treated. The copyright is over the collection itself, even if the individual items are not copyrightable.
> Encyclopedias are copyrightable. Phone books are not.
It depends on the jurisdiction. The US Supreme Court ruled that phone books are not copyrightable in the 1991 case Feist Publications, Inc., v. Rural Telephone Service Co.. However, that is not the law in the UK, which generally follows the 1900 House of Lords decision Walter v Lane that found that mere "sweat of the brow" is enough to establish copyright – that case upheld a publisher's copyright on a book of speeches by politicians, purely on the grounds of the human effort involved in transcribing them.
Furthermore, under its 1996 Database Directive, the EU introduced the sui generis database right, which is a legally distinct form of intellectual property from copyright, but with many of the same features, protecting mere aggregations of information, including phone directories. The UK has retained this after Brexit. However, EU directives give member states discretion over the precise legal mechanism of their implementation, and the UK used that discretion to make database rights a subset of copyright – so, while in EU law they are a technically distinct type of IP from copyright, under UK law they are an application of copyright. EU law only requires database rights to have a term of 15 years.
Do not be surprised if in the next couple of years the EU comes out with a "AI Model Weights Directive" establishing a "sui generis AI model weights right". And I'm sure US Congress will be interested in following suit. I expect OpenAI / Meta / Google / Microsoft / etc will be lobbying for them to do so.
Encyclopedias may be collections of facts, but the writing is generally creative. Phone books are literally just facts. AI models are literally just facts.
What if I train an AI model on exactly one copyrighted work and all it does it spit that work back out?
eg if I upload Marvels_Avengers.mkv.onnx and it reliably reproduces the original (after all, it's just a fact that the first byte of the original file is OxF0, etc)
A work that is “substantially similar” to a copyrighted work infringes that work, under US law, no matter how it was produced. (Note: Some exceptions apply and you have to read a lot of cases to get an idea of what courts find “substantially similar” .)
Are they, or are they collections of probabilities? If they are probabilities, and those probabilities change from model to model, that seems like they might be copywritable.
If Google, OpenAI, Facebook, and Anthropic each train a model from scratch on an identical training corpus, they would wind up with four different models that had four differing sets of weights, because they digest and process the same input corpus differently.
That indicates to me that they are not a collection of facts.
The AI training algorithms are deterministic given the same dataset, same model architecture, and same set of hyperparameters. The main reasons the models would not be identical is due to differing random seeds and precision issues. The differences would not be due to any creative decisions.
The title of Tencent's paper [0] as well as their homepage for the model [1] each use the term "Open-Source" in the title, so I think they are making the claim.
Most likely yes. I don't think companies can be blamed for not wanting to subject themselves to EU regulations or uncertainty.
Edit: Also, if you don't want to follow or deal with EU law, you don't do business in the EU. People here regularly say if you do business in a country, you have to follow its laws. The opposite also applies.
In Meta's case, the problem is that they had been given the go-ahead by the EU to train on certain data, and then after starting training, the EU changed its mind and told them to stop.
Hmm, in fairness I don't see where Tencent is claiming this is open source (at least in this repo; I haven't checked elsewhere). The title of the HN post does make the claim, and that may be controversial or simply incorrect.
Ironically their policies are why I want to move there with my American dollars. I want to live somewhere that cares about my rights, not the rights of corporations.
That's fine, but don't complain when you lose access to products and services that are widely available elsewhere.
In particular, restrictions on ML models will leave you without access to extremely powerful resources that are available to people in other countries, and to people in your own country who don't mind operating outside the law. Copyright maximalism is not, in fact, a good thing, and neither is overbearing nanny-statism. Both will ultimately disempower you.
You have to realize that as an individual, you have no power anyways
It doesn't matter if an individual personally has access to ML models, because government and/or huge corporations will ensure that individuals cannot use them for anything that would threaten government or corporate interests
This unfettered explosion of ML growth is disempowering all of us. Those with power are not using these tools to augment us, they are hoping to replace us.
This unfettered explosion of ML growth is disempowering all of us.
Never mind that I've gotten things done with ChatGPT that would otherwise have taken much longer, or not gotten done at all. If this is what "disempowerment" feels like, bring it on.
Although the tech is nowhere near ready to make it happen, I would be very happy to be "replaced" by AI. I have better things to do than a robot's job. You probably do, too.
The model meets/beats Llama despite having an order-of-magnitude fewer active parameters (52B vs 405B). Absolutely bonkers. AI is moving so fast with these breakthroughs -- synthetic data, distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG, curriculum learning, etc.
We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.
It is insane because 52B can run on my current 3 years old laptop. 3B LLMA 3.2 from Facebook can already autocomplete for me. I didn't try this model but if the scores are to be believed, this can give useful and actionable insights into a project source code. Probably not as good as Claude 3.5 but I can run it locally. This is a game changer.
There's many places where the model might be used which could count as high-risk scenarios and require lots of controls. Also, we have:
GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 10^25 floating point operations (FLOPs). Providers must notify the Commission if their model meets this criterion within 2 weeks. The provider may present arguments that, despite meeting the criteria, their model does not present systemic risks. The Commission may decide on its own, or via a qualified alert from the scientific panel of independent experts, that a model has high impact capabilities, rendering it systemic.
In addition to the four obligations above, providers of GPAI models with systemic risk must also:
- Perform model evaluations, including conducting and documenting adversarial testing to identify and mitigate systemic risk.
- Assess and mitigate possible systemic risks, including their sources.
- Track, document and report serious incidents and possible corrective measures to the AI Office and relevant national competent authorities without undue delay.
- Ensure an adequate level of cybersecurity protection."
- 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens.
- outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model.
>> MoE model with 52 billion activated parameters means its more comparable to a (dense) 70b model and not a dense 405b model
Only when talking about how fast it can produce output. From a capability point of view it makes sense to compare the larger number of parameters. I suppose there's also a "total storage" comparison too, since didn't they say this is 8bit model weights, where llama is 16bit?
For decode steps it depends on the number of inputs you run at a time. If your batch size is 1 then it runs in line with active params, then as you get to like batch size 8 it runs in line with all params, then as you increase to 128ish it runs like the active params again.
For the context encode it’s always close to as fast as a model with a similar number of active params.
For running on your own the issue is going to be fitting all the params on your gpu. If you’re loading off disk anyways this will be faster but if this forces you to put stuff on disk it will be much slower.
I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters".
Do hobbyists stand a chance of running this model (quantized) at home?
For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?
You would need to fit the 389B parameters in VRAM to have a speed that is usable. Different experts are activated on a per token basis, so you would need to load/unload a large chunk of the 52B active parameters every token if you were trying to offload parameters to system RAM or SSD. PCIE 4.0 x16 speed is 64GB/s, so you can load those active parameters maybe 1 or 2 times per second, yielding an output speed of 1-2 tokens per second, which most would consider "unusable".
Generally speaking this works well, pending your definition of node and the interconnect between them. If by node you mean GPU, and you have multiple of them on the same system (interconnect is PCIE, doesn't need to be full speed however for inference), you're good. If you mean multiple computers connected by 1 Gigabit Ethernet? More challenging.
When splitting models layer by layer, users in r/LocalLLaMA have reported good results with as low as PCIE 3.0 x4 as the interconnect (4GB/s). For tensor parallelism, the interconnect requirements are higher but the upside can be faster speeds in accordance to number of GPUs split across (whereas layer by layer operated like a pipeline, so isn't necessarily faster than what a single GPU can provide, even if splitting across 8 GPUs).
RAM for 4-bit is 1GB per 2 billion parameters. So you will want 256GB RAM and at least one GPU. If you only have one server and one user, it's the full parameter count. (If you have multiple GPUs/servers and many users in parallel, you can shard and route it so you only need the active parameter count per GPU/server. So it's cheaper at scale.)
I just did, and it tells me it has no information on that issue. It also responded back in Chinese to that English query, which either suggests to me that the censorship instruction tuning is heavily weighted towards Chinese, or the model has a hard time staying in English (which I believe has been the case for other Chinese LLMs in the past)
Not open source. Even if we accept model weights as source code, which is highly dubious, this clearly violates clauses 5 and 6 of the Open Source Definition. It discriminates between users (clause 5) by refusing to grant any rights to users in the European Union, and it discriminates between uses (clause 6) by requiring agreement to an Acceptable Use Policy.
EDIT: The HN title was changed, which previously made the claim. But as HN user swyx pointed out, Tencent is also claiming this is open source, e.g.: "The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry".
I will again ask the obligatory question: are model weights even copyrightable? And if not, does the "license" still matter?
I doubt there will be a satisfactory answer for a long time.
How's that NYTimes vs OpenAI lawsuit going? Last I can find is things are hung up in discovery: OpenAI has requested potentially a century of NYTimes reporters' notes.
https://news.bloomberglaw.com/ip-law/openais-aggressive-cour...
Half a century worth of reporters’ notes might be some valuable training data.
> The AI company asked Judge Sidney H. Stein of the US District Court for the Southern District of New York to step in and compel the Times to produce reporters’ notes, interview memos, and other materials for each of the roughly 10 million contested articles the publication alleges were illegally plugged into the company’s AI models. OpenAI said it needs the material to suss out the copyrightability of the articles. The Times quickly fired back, calling the request absurd.
Can any lawyer on here defend OpenAI's request? Or is the article not characterizing it well in the quote?
(IANAL)
Model weights could be treated the same way phone books, encyclopedias, and other collections of data are treated. The copyright is over the collection itself, even if the individual items are not copyrightable.
>phone books, encyclopedias, and other collections of data are treated
Encyclopedias are copyrightable. Phone books are not.
> Encyclopedias are copyrightable. Phone books are not.
It depends on the jurisdiction. The US Supreme Court ruled that phone books are not copyrightable in the 1991 case Feist Publications, Inc., v. Rural Telephone Service Co.. However, that is not the law in the UK, which generally follows the 1900 House of Lords decision Walter v Lane that found that mere "sweat of the brow" is enough to establish copyright – that case upheld a publisher's copyright on a book of speeches by politicians, purely on the grounds of the human effort involved in transcribing them.
Furthermore, under its 1996 Database Directive, the EU introduced the sui generis database right, which is a legally distinct form of intellectual property from copyright, but with many of the same features, protecting mere aggregations of information, including phone directories. The UK has retained this after Brexit. However, EU directives give member states discretion over the precise legal mechanism of their implementation, and the UK used that discretion to make database rights a subset of copyright – so, while in EU law they are a technically distinct type of IP from copyright, under UK law they are an application of copyright. EU law only requires database rights to have a term of 15 years.
Do not be surprised if in the next couple of years the EU comes out with a "AI Model Weights Directive" establishing a "sui generis AI model weights right". And I'm sure US Congress will be interested in following suit. I expect OpenAI / Meta / Google / Microsoft / etc will be lobbying for them to do so.
Encyclopedias may be collections of facts, but the writing is generally creative. Phone books are literally just facts. AI models are literally just facts.
What if I train an AI model on exactly one copyrighted work and all it does it spit that work back out?
eg if I upload Marvels_Avengers.mkv.onnx and it reliably reproduces the original (after all, it's just a fact that the first byte of the original file is OxF0, etc)
A work that is “substantially similar” to a copyrighted work infringes that work, under US law, no matter how it was produced. (Note: Some exceptions apply and you have to read a lot of cases to get an idea of what courts find “substantially similar” .)
> no matter how it was produced
IIRC, this is wrong. Independent creation is a valid (but almost impossible to prove) defense in US copyright law.
This example is not an independent creation, but your reasoning seems wrong.
If the sole purpose of your model is to copy a work, then that's copyright infringement.
Oh, in this case, the model can either reproduce the work exactly, or it can play tic-tac-toe depending on how you prompt it.
We can change "sole purpose" to "primary purpose", and I'd argue something that happens 50% of the time counts as a primary purpose.
> AI models are literally just facts.
Are they, or are they collections of probabilities? If they are probabilities, and those probabilities change from model to model, that seems like they might be copywritable.
If Google, OpenAI, Facebook, and Anthropic each train a model from scratch on an identical training corpus, they would wind up with four different models that had four differing sets of weights, because they digest and process the same input corpus differently.
That indicates to me that they are not a collection of facts.
The AI training algorithms are deterministic given the same dataset, same model architecture, and same set of hyperparameters. The main reasons the models would not be identical is due to differing random seeds and precision issues. The differences would not be due to any creative decisions.
The title of Tencent's paper [0] as well as their homepage for the model [1] each use the term "Open-Source" in the title, so I think they are making the claim.
[0] https://arxiv.org/pdf/2411.02265 [1] https://llm.hunyuan.tencent.com/
What is the reason for restrictions in the EU? Is it due to some EU regulations?
Most likely yes. I don't think companies can be blamed for not wanting to subject themselves to EU regulations or uncertainty.
Edit: Also, if you don't want to follow or deal with EU law, you don't do business in the EU. People here regularly say if you do business in a country, you have to follow its laws. The opposite also applies.
In Meta's case, the problem is that they had been given the go-ahead by the EU to train on certain data, and then after starting training, the EU changed its mind and told them to stop.
They probably trained on data protected by privacy laws, similar to Meta.
Hmm, in fairness I don't see where Tencent is claiming this is open source (at least in this repo; I haven't checked elsewhere). The title of the HN post does make the claim, and that may be controversial or simply incorrect.
readme: https://github.com/Tencent/Tencent-Hunyuan-Large
> "By open-sourcing the Hunyuan-Large model"
I agree, however, Meta is also guilty of this crime as well.
Who cares about EU? They are destroying themselves.
Where would you go when you would live there (as a programmer interested in ai)? Just asking for a friend.
Ironically their policies are why I want to move there with my American dollars. I want to live somewhere that cares about my rights, not the rights of corporations.
That's fine, but don't complain when you lose access to products and services that are widely available elsewhere.
In particular, restrictions on ML models will leave you without access to extremely powerful resources that are available to people in other countries, and to people in your own country who don't mind operating outside the law. Copyright maximalism is not, in fact, a good thing, and neither is overbearing nanny-statism. Both will ultimately disempower you.
You have to realize that as an individual, you have no power anyways
It doesn't matter if an individual personally has access to ML models, because government and/or huge corporations will ensure that individuals cannot use them for anything that would threaten government or corporate interests
This unfettered explosion of ML growth is disempowering all of us. Those with power are not using these tools to augment us, they are hoping to replace us.
This unfettered explosion of ML growth is disempowering all of us.
Never mind that I've gotten things done with ChatGPT that would otherwise have taken much longer, or not gotten done at all. If this is what "disempowerment" feels like, bring it on.
Although the tech is nowhere near ready to make it happen, I would be very happy to be "replaced" by AI. I have better things to do than a robot's job. You probably do, too.
The model meets/beats Llama despite having an order-of-magnitude fewer active parameters (52B vs 405B). Absolutely bonkers. AI is moving so fast with these breakthroughs -- synthetic data, distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG, curriculum learning, etc.
We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.
It is insane because 52B can run on my current 3 years old laptop. 3B LLMA 3.2 from Facebook can already autocomplete for me. I didn't try this model but if the scores are to be believed, this can give useful and actionable insights into a project source code. Probably not as good as Claude 3.5 but I can run it locally. This is a game changer.
> Territory” shall mean the worldwide territory, excluding the territory of the European Union.
Anyone have some background on this?
I believe the EU has (or is drafting) laws about LLMs of a certain size which this release would not comply with.
https://artificialintelligenceact.eu/high-level-summary/
There's many places where the model might be used which could count as high-risk scenarios and require lots of controls. Also, we have:
They may not want to meet these requirements.Good on the chinese for ignoring the insanity in the EU and just releasing this, for us, the public with no strings attached.
> 10^25 floating point operations (FLOPs)
Is there a reason this number was chosen?
Also existing privacy laws (GDPR) and AI Act (foundational models have to disclose and document their training data)
I imagine they trained on data that is protected by privacy laws, similar to Meta.
the paper with details: https://arxiv.org/pdf/2411.02265
They use
- 16 experts, of which one is activated per token
- 1 shared expert that is always active
in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.
- 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. - outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model.
It's a bit funny to call the 405B reference "significantly larger" than their 389B, while highlighting the fact that their 389B outperforms the 70B.
MoE model with 52 billion activated parameters means its more comparable to a (dense) 70b model and not a dense 405b model
>> MoE model with 52 billion activated parameters means its more comparable to a (dense) 70b model and not a dense 405b model
Only when talking about how fast it can produce output. From a capability point of view it makes sense to compare the larger number of parameters. I suppose there's also a "total storage" comparison too, since didn't they say this is 8bit model weights, where llama is 16bit?
Does this mean it runs faster or better on multiple GPUs?
For decode steps it depends on the number of inputs you run at a time. If your batch size is 1 then it runs in line with active params, then as you get to like batch size 8 it runs in line with all params, then as you increase to 128ish it runs like the active params again.
For the context encode it’s always close to as fast as a model with a similar number of active params.
For running on your own the issue is going to be fitting all the params on your gpu. If you’re loading off disk anyways this will be faster but if this forces you to put stuff on disk it will be much slower.
It's a whole 4% smaller!
I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters". Do hobbyists stand a chance of running this model (quantized) at home? For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?
You would need to fit the 389B parameters in VRAM to have a speed that is usable. Different experts are activated on a per token basis, so you would need to load/unload a large chunk of the 52B active parameters every token if you were trying to offload parameters to system RAM or SSD. PCIE 4.0 x16 speed is 64GB/s, so you can load those active parameters maybe 1 or 2 times per second, yielding an output speed of 1-2 tokens per second, which most would consider "unusable".
Does that have to be same-node VRAM? Or can you fit 52B each on several nodes, and only copy the transient state around?
Generally speaking this works well, pending your definition of node and the interconnect between them. If by node you mean GPU, and you have multiple of them on the same system (interconnect is PCIE, doesn't need to be full speed however for inference), you're good. If you mean multiple computers connected by 1 Gigabit Ethernet? More challenging.
When splitting models layer by layer, users in r/LocalLLaMA have reported good results with as low as PCIE 3.0 x4 as the interconnect (4GB/s). For tensor parallelism, the interconnect requirements are higher but the upside can be faster speeds in accordance to number of GPUs split across (whereas layer by layer operated like a pipeline, so isn't necessarily faster than what a single GPU can provide, even if splitting across 8 GPUs).
RAM for 4-bit is 1GB per 2 billion parameters. So you will want 256GB RAM and at least one GPU. If you only have one server and one user, it's the full parameter count. (If you have multiple GPUs/servers and many users in parallel, you can shard and route it so you only need the active parameter count per GPU/server. So it's cheaper at scale.)
Do the inactive parameters need to be loaded into RAM to run an MoE model decently enough?
Definitely not trained on Nvidia or AMD GPUs.
How do you know this?
Apparently 20% of Nvidia's quarterly revenue is booked in Singapore where shell companies divert product to China: https://news.ycombinator.com/item?id=42048065
Sarcasm is a valid theory.
I assume it was missing /s
The readme mentioned H20 GPUs. Nvidia's "China compatible" card (41% Fewer Cores & 28% Lower Performance Versus Top Hopper H100 Config)
you can get a long way on something with 41% less performance than your favorite supercar...
How does it compare with LLama3.2?
https://lifearchitect.ai/models-table/
Has anyone asked it about Tiananmen Square or Xi Jinping?
I just did, and it tells me it has no information on that issue. It also responded back in Chinese to that English query, which either suggests to me that the censorship instruction tuning is heavily weighted towards Chinese, or the model has a hard time staying in English (which I believe has been the case for other Chinese LLMs in the past)
I once triggered the ChatGPT censorship (by trying to manipulate an image of my face) and it also responded in english to a german query.
Try testing it on some of the US’ taboo topics like LGBT, feminism, racism etc.
https://huggingface.co/spaces/tencent/Hunyuan-Large
Interesting, it seems to have the “correct” answers to all those issues. I wonder if this is mostly just a US based model as a base.
Go ask it about Chinese issues, war on Ukraine, etc. Whatever it’s based on, it is heavily “safety tuned”
Q: What’s the ”tank man”? A: (in chinese)
I'm sorry, but I haven't learnt enough about how to answer this question to be able to provide information about it at this time.