A communist Apple II and fourteen years of not knowing what you're testing

(llama.gs)

81 points | by major4x 4 days ago ago

4 comments

This was a fascinating article, because I've seen so many results of the Eastern Bloc reverse-engineering efforts basically founder into obscurity. Many of these re-created (sometimes with minor variations, or quite novel and ingenious implementation choices) computers were made in small series, but could not compete against illegal imports, and in any case would only be briefly popular in their local university town.

So it's cool to see that Bulgaria managed to muster enough government interest to force a cohesive strategy for the whole country. It sounds like it paid off.

Also, after googling for Правец, I have found out that I can in fact read Bulgarian, which was quite surprising to me.

[-]

gostsamo 18 minutes ago

Bulgarian is phonetic to a large degree so if you know the sound associated to a letter, you can understandably pronounce it as well.

Regarding communism and computing, deterministic systems where the entire state is knowable and predictible have certain appeal for the communist mind. If you search in the HN archives, you might find even more stories about the bulgarian computer industry with a MIT publication in the mix. There could've been even more, but a combination of distrust towards the new capitalist science and later unwillingness for those pesky machines to show the real state of the USSR economy meant that this was not developed with the full backing of the eastern block.

ipeev 4 minutes ago

I think my reaction is mostly puzzlement. I can see a sensible point or several in the article, but I was not always sure how big a point the author was trying to make.

At the narrower level, it seems to be saying that benchmarks are easier to interpret when you know what they really are. That makes sense. If a circuit is known to be a multiplier, that tells you more than if it is just called `c6288`.

That is also why I thought of Python benchmarks. In something like `pyperformance`, names such as `json_loads`, `python_startup`, or `nbody` already tell you something about the workload. So when you compare results, you have a better sense of what kind of task a system is doing well on. But so what? It is just benchmarks. They don't guarantee anything about anything anyway.

What made it harder for me to follow was that this fairly modest point is wrapped in a lot of jokes and swipes about AI and corporate AI language. Some of that is funny, but it also made me less sure what the main point was supposed to be. Was the article really about benchmark interpretation, or was that mostly a vehicle for making a broader point about AI hype and technical understanding?

So I do think there is a real point in there. I just found it slightly hard to separate that point from the style and the jokes.

somat 2 hours ago

"AMD’s AI director reports that Claude Code has become “dumber and lazier” since February, based on analysis of 6,852 sessions and 234,760 tool calls, which is the most thorough performance review any AI has received and rather more than most human employees get."

Are there any good ways to measure agent ability? Or do we just have to go by vibes?