> Look at my drafts that were started within the last three months and then check that I didn’t publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready
This is a very detailed, particular prompt. The type of prompt a programmer would think of as they were trying to break down a task into something that can be implemented. It is so programmer-brained that I come away not convinced that a typical user would be able to write it.
This isn’t an AI skepticism post - the fact that it handles the prompt well is very impressive. But I’m skeptical that the target user is thinking clearly enough to prompt this well.
Since LLMs were introduced, I've been of the belief that this technology actually makes writing a *more* important skill to develop than less. So far that belief has held. No matter how advanced the model gets, you'll get better results if you can clarify your thoughts well in written language.
There may be a future AI-based system that can retain so much context it can kind of just "get what you mean" when you say off-the-cuff things, but I believe that a user that can think, speak, and write clearly will still have a skill advantage over one that does not.
FWIW, I've heard many people say that with voice dictation they ramble to LLMs and by speaking more words can convey their meaning well, even if their writing quality is low. I don't do this regularly, but when I have tried it, it seemed to work just as well as my purposefully-written prompts. I can imagine a non-technical person rambling enough that the AI gets what they mean.
Thats a fair counterpoint, and it has helped translate my random thoughts into more coherent text. I also haven't taken advantage of dictation much at all either, so maybe I'll give it a try. I still think the baseline skill that writing gives you translates to an LLM-use skill, which is thinking clearly and knowing how to structure your thoughts. Maybe folks can get that skill in other ways (oration, art, etc.). I don't need to give it essays, but I do need to give it clear instructions. Every time it spins off and does something I don't want, its because I didn't clarify my thoughts correctly.
It takes a certain amount of expertise to use LLMs effectively. And I know that some people claim otherwise but they simply aren't worth listening to.
Just because Claude Cowork is for "other" kinds of work, not just software engineering, doesn't in any way change that. It's not like other kinds of knowledge work aren't being done by intelligent professionals who invest time into learning how to use complicated software and systems
That is to say, I don't know who the "target user" of this is, but it is a $100/month subscription, so it's presumably someone who is a pretty serious AI user.
One part I like about LLMs is that they can smooth over the rough edges in programming. Lots of people can build pretty complicated spreadsheets, can break down a problem into clear discrete tasks, or can at least look at a set of steps and validate that solves the issue they have & more easily updated it. Those people don’t necessarily know json isn’t a person, how to install python or how to iterate over these things. I cant give directions in Spanish but its not because I don’t know how to get to the library its just I can’t translate precisely.
Also you may only need someone to write the meta prompt that then spits out this kind of thing given some problem “I want to find the easiest blog posts to finish in my drafts but some are already published” then a more detailed prompt out of it, read it and set things going.
I enjoyed hearing Claude Code creator Boris Cherny talk about "latent demand"[0], which is when users start using your product for something it was not intended for. When that happens, it's a great signal that you should go build that into a full product.
Cowork seems like a great application of that principle.
This is a nice technical account that we're used to seeing from Simon.
I get a kick out of the fact that Microsoft has been preciously clinging to the "Copilot" branding and here comes Claude coming saying "Cowork? Good enough for us!".
-
Taking a step back, I really would love to see a broader perspective -- an account of someone who is not tech savvy at all. Someone who works a basic desk job that requires basic competency of microsoft word. I'm so deep into the bubble of AI-adjacent people that I haven't taken stock of how this would or could empower those who are under-skilled.
We've taken it as truth that those who benefit most from AI are high-skilled augmenters, but do others see some lift from it? I'd love if anthropic tried to strap some barely-performing administrative assistants into these harnesses and see if there's a net benefit. For all I know, it's not inconceivable that there be a `rm -rf` catastrophe every other hour.
This predates Cowork, but I have started to see "non-technical" journalists start taking Claude Code seriously recently. For instance, Joe Weisenthal has been writing about this, eg.: https://nitter.net/thestalwart/status/2010512842705735948.
>Someone who works a basic desk job that requires basic competency of microsoft word.
I dont actually think there many of those people out there. And those that are, are on their way out. There are basically none of those people entering the work force. There are tons of people with that sort of computer literacy but they aren't working on computers.
Eh, I can think of some examples for sure, I think there are still a lot of people like this.
* Bookkeeper & planning approval within city government
* Doctor/dentist/optometry receptionist & scheduler (both at independent offices and at major hospitals)
* Front desk staff at almost every company with a physical front desk
* University administrative staff (there can be a lot more of these people than you'd think)
* DMV workers
* Probably lots of teachers
Those jobs all will use other software as well, but a lot of their job is making and filling forms on a computer, where they are likely needing to use MS Word fairly often to write things up.
Leveraging Claude Code in a Linux shell to do all sorts of stuff has been an amazing superpower for me, and I think for many others. Cowork is a promising next step to democratize this superpower for others.
If Microsoft, in creating their next gen agentic OS, wants to replace Windows with the Linux kernal, Claude Code, and bash shell (turning Windows into a distribution of sorts,) more power to them. However, I doubt this is the direction they'll go.
I think Claude Cowork should come with a requirement or a very heavily structured wizard process to ensure the machine has something like a Time Machine backup or other backups that are done regularly, before it is used by folks.
The failure modes are just too rough for most people to think about until it's too late.
I just used Claude Code to do something that would have taken my wife 3+ days
She has to go through about 100 resumes for a position at her college. Each resume is essentially a form the candidate filled out and lists their detailed academic scores from high school > PhD, their work experience, research and publications.
Based on the declared data, candidates are scored by the system
Now this is India and there's a decent amount of fraud, so an individual has to manually check the claimed experience/scores/publications against reality
A candidate might claim to have relevant experience, but the college might be unaccredited, or the claimed salary might be way too low for a relevant academic position. Or they might claim to have published in XYZ journal, but the journal itself might be a fraudulent pay-to-publish thing
Going through 100+ resumes, each 4 pages long is a nightmare of a task. And boring too.
--
So I asked Claude Code to figure out the problem. I gave it a PDF with the scoring guidelines, a sample resume, and asked it to figure out the problem
Without me telling it, it figured out a plan that involved checking a college's accredition and rating (the govt maintains a rating for all colleges), the claimed salary vs actual median salary for that position (too low is a red flag), and whether the claimed publication is in either the SCOPUS index or a govt approved publications index
(I emphasize govt approved because this is in a govt backed institution)
Then I gave it access to a folder with all the 100 resumes.
In less than 30 minutes, it evaluated all candidates and added the evaluation to a CSV file. I asked it to make it more readable, so it made a HTML page with data from all the candidates and red/green/yellow flags about their work-experience, publications, and employment
It made a prioritized list of the most promising candidates based on this data
My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions
This was a 3 day, grinding task done in 30 minutes. And all I did was type into a terminal for 20 minutes
Given the data exfil vulnerability a few stories down HN's front page I would be extremely hesitant to ask Claude to process a document someone else produced and sent to me
> My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions
She's right not to trust it for something like this. The "almost 100%" is the problem (also consider that you're sending personal data to anthropic without permission) especially for something like this where it might mean discarding someone's resume, which is something that could have a significant impact on a person's life.
Was the double-checking done in that 30 minutes? The fact that it wasn't 100% right means that the human in the loop was still important, so I'm just trying to understand the actual time saved.
I worry this is gonna cause even more sensitive/privilaged data extrafiltration than currently is happening. And most “normies” won't even notice.
I know the counterargument is people are already putting in company data via ChatGPT. However, that is a conscious decision. This may happen without people even recognizing that they are “spilling the beans”.
I think you're right, but the issue goes deeper. If the productivity gains are real, the incentive to bypass security becomes overwhelming. We are going to see a massive conflict where compliance tries to clamp down, but eventually loses to 'getting work done.'
Even if critics are right that these models are inherently insecure, the market will likely settle for 'optically patched.' If the efficiency gains are there, companies will just accept the residual risk.
This is some low hanging fruit that keeps getting driven by in order to speed up development. There is so so much potential here. If this can replace the RPS consulting industry I won't be unhappy. Let individuals do it themselves so they have time to work themselves into some other position or move up/take on more responsibility.
He's a proponent, but that doesn't mean his analysis isn't useful. It's clear and mostly accurate and when he gets something wrong he makes it right. Does he do all that with rose tinted glasses, probably, but my experience reading him is that he's sharp, thoughtful, and entirely reasonable.
Dismissing the opportunity to learn because the person offering you knowledge is enthusiastic about his area of expertise is probably shortsighted.
This is more akin to a race car driver give a review of, for example, a new type of electric car. It doesn’t matter that the driver is not a domain expert in electric motors and regenerative braking; what matters is he knows how to operate these machines in their use case at the limits.
Hearing a programming legend weigh in on the latest programming tool seems entirely completely reasonable.
Very generous example there. Except it’s more like a car YouTuber reviewing beta autopilot. They’ll tell you it’s “amazing” on a clear highway, ignore the scary edge cases, and somehow every conclusion aligns with what gets more views and forum clout.
I don't think they were being dismissive. They just said they were skeptical, which is generally a good thing. It's certainly better than the goofy hero worship I constantly see on HN.
For me, I recently wanted to assemble a “supercut” of my videos of attempts at learning to bunny-hop a bike. The tool was able to craft a python script that used ffmpeg to edit out the no-motion portions of the videos and stitch them together.
This would have taken ages to do by hand in iMovie, and probably just as long to look up the needed parameters in ffmpeg, but Claude code got it right in the first try, and worked with me to fine-tune the motion detection threshold.
One rough edge for me: the cowork interface seems to have turned off “extensions” - my first ask was to read some emails and compare with some local documents and draft a document. It kept trying to use claude chrome to navigate to gmail.
I’m not sure what the plan for integrating extensions is here but they definitely will be wanted.
> Look at my drafts that were started within the last three months and then check that I didn’t publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready
This is a very detailed, particular prompt. The type of prompt a programmer would think of as they were trying to break down a task into something that can be implemented. It is so programmer-brained that I come away not convinced that a typical user would be able to write it.
This isn’t an AI skepticism post - the fact that it handles the prompt well is very impressive. But I’m skeptical that the target user is thinking clearly enough to prompt this well.
Since LLMs were introduced, I've been of the belief that this technology actually makes writing a *more* important skill to develop than less. So far that belief has held. No matter how advanced the model gets, you'll get better results if you can clarify your thoughts well in written language.
There may be a future AI-based system that can retain so much context it can kind of just "get what you mean" when you say off-the-cuff things, but I believe that a user that can think, speak, and write clearly will still have a skill advantage over one that does not.
FWIW, I've heard many people say that with voice dictation they ramble to LLMs and by speaking more words can convey their meaning well, even if their writing quality is low. I don't do this regularly, but when I have tried it, it seemed to work just as well as my purposefully-written prompts. I can imagine a non-technical person rambling enough that the AI gets what they mean.
Thats a fair counterpoint, and it has helped translate my random thoughts into more coherent text. I also haven't taken advantage of dictation much at all either, so maybe I'll give it a try. I still think the baseline skill that writing gives you translates to an LLM-use skill, which is thinking clearly and knowing how to structure your thoughts. Maybe folks can get that skill in other ways (oration, art, etc.). I don't need to give it essays, but I do need to give it clear instructions. Every time it spins off and does something I don't want, its because I didn't clarify my thoughts correctly.
It takes a certain amount of expertise to use LLMs effectively. And I know that some people claim otherwise but they simply aren't worth listening to.
Just because Claude Cowork is for "other" kinds of work, not just software engineering, doesn't in any way change that. It's not like other kinds of knowledge work aren't being done by intelligent professionals who invest time into learning how to use complicated software and systems
That is to say, I don't know who the "target user" of this is, but it is a $100/month subscription, so it's presumably someone who is a pretty serious AI user.
One part I like about LLMs is that they can smooth over the rough edges in programming. Lots of people can build pretty complicated spreadsheets, can break down a problem into clear discrete tasks, or can at least look at a set of steps and validate that solves the issue they have & more easily updated it. Those people don’t necessarily know json isn’t a person, how to install python or how to iterate over these things. I cant give directions in Spanish but its not because I don’t know how to get to the library its just I can’t translate precisely.
Also you may only need someone to write the meta prompt that then spits out this kind of thing given some problem “I want to find the easiest blog posts to finish in my drafts but some are already published” then a more detailed prompt out of it, read it and set things going.
Select star from blog posts where... :)
Author site: https://simonwillison.net/2026/Jan/12/claude-cowork/
I enjoyed hearing Claude Code creator Boris Cherny talk about "latent demand"[0], which is when users start using your product for something it was not intended for. When that happens, it's a great signal that you should go build that into a full product.
Cowork seems like a great application of that principle.
[0] https://www.youtube.com/watch?v=AmdLVWMdjOk
This is a nice technical account that we're used to seeing from Simon.
I get a kick out of the fact that Microsoft has been preciously clinging to the "Copilot" branding and here comes Claude coming saying "Cowork? Good enough for us!".
-
Taking a step back, I really would love to see a broader perspective -- an account of someone who is not tech savvy at all. Someone who works a basic desk job that requires basic competency of microsoft word. I'm so deep into the bubble of AI-adjacent people that I haven't taken stock of how this would or could empower those who are under-skilled.
We've taken it as truth that those who benefit most from AI are high-skilled augmenters, but do others see some lift from it? I'd love if anthropic tried to strap some barely-performing administrative assistants into these harnesses and see if there's a net benefit. For all I know, it's not inconceivable that there be a `rm -rf` catastrophe every other hour.
This predates Cowork, but I have started to see "non-technical" journalists start taking Claude Code seriously recently. For instance, Joe Weisenthal has been writing about this, eg.: https://nitter.net/thestalwart/status/2010512842705735948.
The Atlantic just did a dedicated article the other day. Gift link: https://www.theatlantic.com/technology/2026/01/claude-code-a...
>Someone who works a basic desk job that requires basic competency of microsoft word.
I dont actually think there many of those people out there. And those that are, are on their way out. There are basically none of those people entering the work force. There are tons of people with that sort of computer literacy but they aren't working on computers.
Eh, I can think of some examples for sure, I think there are still a lot of people like this.
* Bookkeeper & planning approval within city government
* Doctor/dentist/optometry receptionist & scheduler (both at independent offices and at major hospitals)
* Front desk staff at almost every company with a physical front desk
* University administrative staff (there can be a lot more of these people than you'd think)
* DMV workers
* Probably lots of teachers
Those jobs all will use other software as well, but a lot of their job is making and filling forms on a computer, where they are likely needing to use MS Word fairly often to write things up.
A lot of these have to do with other peoples data. Are we feeding these machines social security numbers and other PII?
I hope not, but… Yes, probably is happening regularly everywhere it’s not explicitly regulated
Leveraging Claude Code in a Linux shell to do all sorts of stuff has been an amazing superpower for me, and I think for many others. Cowork is a promising next step to democratize this superpower for others.
If Microsoft, in creating their next gen agentic OS, wants to replace Windows with the Linux kernal, Claude Code, and bash shell (turning Windows into a distribution of sorts,) more power to them. However, I doubt this is the direction they'll go.
I think Claude Cowork should come with a requirement or a very heavily structured wizard process to ensure the machine has something like a Time Machine backup or other backups that are done regularly, before it is used by folks.
The failure modes are just too rough for most people to think about until it's too late.
I just used Claude Code to do something that would have taken my wife 3+ days
She has to go through about 100 resumes for a position at her college. Each resume is essentially a form the candidate filled out and lists their detailed academic scores from high school > PhD, their work experience, research and publications.
Based on the declared data, candidates are scored by the system
Now this is India and there's a decent amount of fraud, so an individual has to manually check the claimed experience/scores/publications against reality
A candidate might claim to have relevant experience, but the college might be unaccredited, or the claimed salary might be way too low for a relevant academic position. Or they might claim to have published in XYZ journal, but the journal itself might be a fraudulent pay-to-publish thing
Going through 100+ resumes, each 4 pages long is a nightmare of a task. And boring too.
--
So I asked Claude Code to figure out the problem. I gave it a PDF with the scoring guidelines, a sample resume, and asked it to figure out the problem
Without me telling it, it figured out a plan that involved checking a college's accredition and rating (the govt maintains a rating for all colleges), the claimed salary vs actual median salary for that position (too low is a red flag), and whether the claimed publication is in either the SCOPUS index or a govt approved publications index
(I emphasize govt approved because this is in a govt backed institution)
Then I gave it access to a folder with all the 100 resumes.
In less than 30 minutes, it evaluated all candidates and added the evaluation to a CSV file. I asked it to make it more readable, so it made a HTML page with data from all the candidates and red/green/yellow flags about their work-experience, publications, and employment
It made a prioritized list of the most promising candidates based on this data
My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions
This was a 3 day, grinding task done in 30 minutes. And all I did was type into a terminal for 20 minutes
Given the data exfil vulnerability a few stories down HN's front page I would be extremely hesitant to ask Claude to process a document someone else produced and sent to me
> My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions
She's right not to trust it for something like this. The "almost 100%" is the problem (also consider that you're sending personal data to anthropic without permission) especially for something like this where it might mean discarding someone's resume, which is something that could have a significant impact on a person's life.
What human has better than “almost 100%” on a dull task they have to grind at for 3 days?
Humans are terrible at that kind of long term focus, make clerical errors, etc.
Doesn’t submitting the resumes to Anthropic violate India’s data protection laws?
Was the double-checking done in that 30 minutes? The fact that it wasn't 100% right means that the human in the loop was still important, so I'm just trying to understand the actual time saved.
> And all I did was type into a terminal for 20 minutes
Well, and learning how to do that in 20 minutes
I worry this is gonna cause even more sensitive/privilaged data extrafiltration than currently is happening. And most “normies” won't even notice.
I know the counterargument is people are already putting in company data via ChatGPT. However, that is a conscious decision. This may happen without people even recognizing that they are “spilling the beans”.
This hit the front page yesterday so you may have seen it, but figured I'd post for posterity sake
> Claude Cowork exfiltrates files https://news.ycombinator.com/item?id=46622328
I think you're right, but the issue goes deeper. If the productivity gains are real, the incentive to bypass security becomes overwhelming. We are going to see a massive conflict where compliance tries to clamp down, but eventually loses to 'getting work done.'
Even if critics are right that these models are inherently insecure, the market will likely settle for 'optically patched.' If the efficiency gains are there, companies will just accept the residual risk.
This is some low hanging fruit that keeps getting driven by in order to speed up development. There is so so much potential here. If this can replace the RPS consulting industry I won't be unhappy. Let individuals do it themselves so they have time to work themselves into some other position or move up/take on more responsibility.
I don’t think I’ve ever seen this guy say anything negative about an AI product, which makes me skeptical of his insights here.
He's a proponent, but that doesn't mean his analysis isn't useful. It's clear and mostly accurate and when he gets something wrong he makes it right. Does he do all that with rose tinted glasses, probably, but my experience reading him is that he's sharp, thoughtful, and entirely reasonable.
Dismissing the opportunity to learn because the person offering you knowledge is enthusiastic about his area of expertise is probably shortsighted.
How is this Simon’s area of expertise? I know he’s a programming legend, but I’ve never heard anything to indicate he’s a machine learning expert.
I’m not intending to be dismissive, just noticing a pattern and advocating a bit of skepticism.
This is more akin to a race car driver give a review of, for example, a new type of electric car. It doesn’t matter that the driver is not a domain expert in electric motors and regenerative braking; what matters is he knows how to operate these machines in their use case at the limits.
Hearing a programming legend weigh in on the latest programming tool seems entirely completely reasonable.
Very generous example there. Except it’s more like a car YouTuber reviewing beta autopilot. They’ll tell you it’s “amazing” on a clear highway, ignore the scary edge cases, and somehow every conclusion aligns with what gets more views and forum clout.
I don't think they were being dismissive. They just said they were skeptical, which is generally a good thing. It's certainly better than the goofy hero worship I constantly see on HN.
id imagine someone like Simon to pick his AI products carefully enough that he doesn't waste his time on duds.
My imagination may be lacking, but what would you realistically use a tool like this for?
For me, I recently wanted to assemble a “supercut” of my videos of attempts at learning to bunny-hop a bike. The tool was able to craft a python script that used ffmpeg to edit out the no-motion portions of the videos and stitch them together.
This would have taken ages to do by hand in iMovie, and probably just as long to look up the needed parameters in ffmpeg, but Claude code got it right in the first try, and worked with me to fine-tune the motion detection threshold.
I've been using Google's Antigravity (which has a similar UI) to do data analysis and making reports. Skills are really useful for that.
One rough edge for me: the cowork interface seems to have turned off “extensions” - my first ask was to read some emails and compare with some local documents and draft a document. It kept trying to use claude chrome to navigate to gmail.
I’m not sure what the plan for integrating extensions is here but they definitely will be wanted.
just in case anyone is interested, I will mention my MIT licensed project that is very useful with Claude https://github.com/runvnc/mindroot