Like other commenters point out, automatic OCR on Apple platforms is a godsend, and it's such a great use of our modern AI capabilities that it should be a standard feature in every document viewer on every platform.
Another thing I wish was more common is metadata in screenshots, especially on phones. Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded (eg instagram.com/p/ABCD1234/). If I take a screenshot in the browser, include the URL that's being viewed (+ path to the DOM element in the viewport). If I take a screenshot in a maps app, include the bounding coordinates. If I take a screenshot in a PDF viewer, include a SHA1 hash of the document being viewed + offset in the document so that if I send the screenshot to someone else with the same document, it can seamlessly link to it. Etc etc.
There are probably privacy concerns to solve here, but no idea is new in computer science and I'm pretty sure some grad student somewhere has already explored the topic in depth (it just never made it to mainstream computing platforms).
It feels like screenshots have become the de facto common denominator in our mobile computing era, since platforms have abstracted files away from us. Lots of people who have only ever used phones as their main computing devices are confused when it comes to files, but everyone seems to understand screenshots.
OCR is a godsend, 100% agree. Not a fan of the metadata idea personally, 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing (like your examples of GPS coordinates for a maps app, url for browser) sounds like a privacy nightmare, and like something that will make a very reliable core feature much harder to use.
There are companies like Evernote/Zight/CloudApp that at one point tried some things like this, but they never really caught - I think because it's pretty easy to add annotations yourself or some note of your own - and a screenshot not "trying to do everything" is part of what makes them useful & ubiquitous.
> It feels like screenshots have become the de facto common denominator in our mobile computing era,
Google/Apple have taken notice. Both have recently redone their full-screen post-screenshot UI to include AI insights / automatic product searches / direct chat with Gemini/LLM / etc.
Its true everyone uses screenshots to save things they are interested in or want to look up / search more of / save for reason and this UI is the perfect place to insert themselves.
OP here. You raised a point that I should have mentioned in the article: screenshots of web pages that don't include the URL. I'm perfectly fine with screenshots of browser windows, since the context is almost always relevant. The system I work on right now puts a lot of useful context into the URL, but it's almost never included in the initial screenshot, so I have to ask for that. Of course, I generally ask for it as text so that I don't have to try to type the whole thing without making a mistake.
The ability to highlight/copy/etc text on Macs/iOS these days is such a killer feature. I use it almost every day, both for copying/translating text in screenshots or taking photos of text to then copy it into my notes later (eg school notice boards or event posters etc).
Part of what makes it so good is that it's everywhere. Preview, QuickLook, QuickTime Player (yes, videos get OCR'd too!), any app that uses the system frameworks for displaying media.
This includes Safari, where not only do images (inline or otherwise) have selectable text, but the built in translator leverages that text and uses it to translate the image, too! This is super useful for translating Japanese webpages in particular, which tend to have tons of text baked into images.
I have Shottr keyboard shortcut (cmd+opt+control+o) setup to allow me to OCR from whatever is on the screen and copy the text to clipboard.
So whether someone shares code or error log as screenshot on slack, it’s 3 steps: 1. cmd+opt+control+o 2. select the area to OCR 3. cmd+v in vscode or google
I disagree. I use screenshots all the time, because it:
- Preserves the full 80 character width without line-wrapping, which destroys readability
- Guarantees monospace, so tabular data doesn't get all misaligned
- Preserves a good coding font, so it doesn't come out as some hairline-width Courier on the other end
- Preserves syntax highlighting, very helpful
Obviously if somebody needs a whole file or whole log, then send the whole thing as an attachment. But very often I'll still include a screenshot of the relevant part. With line numbers, it's not difficult to jump to the right part of the attached file.
Screenshots are incredibly useful for keeping code and terminal output looking like code and terminal output, and not getting completely mangled in an e-mail or chat message being read on a mobile device or in a narrow column.
Key things required for posting to the chat: people reading can read it, people reading can copy and paste it, and people searching can actually find it. It doesn't need to exactly match what you might see in a text editor. Anybody wanting to look at the actual text in context won't be doing it in the chat, but will rather be opening the file of interest in the appropriate tool, and examining it that way; anybody stuck reading the text only in the chat is probably on their phone or something and will be best served by being able to easily see all of it.
For reading purposes, the question of screen width is best left to the reader. They will have the window set to their preferred width, possibly limited by screen size. If the text has to wrap, so be it. It's better that than having to try to squint at your 3713x211 screen grab on an iPhone (portrait orientation). Also bear in mind that even the most basic of font and colour choices (large/small font, dark/light mode) can cause accessibility issues for some readers.
For copying and pasting purposes, images suck. Yes, macOS can do it, sort of, and I expect Windows 11 can do it too, probably to about the same extent. But it's not as easy as having the text right there in copyable form.
For searching purposes, ditto - only worse, because at least when you copy and paste and it comes out wrong, you'll notice. When you search: you just won't find the thing. You'll never know.
> - Preserves the full 80-character width without line-wrapping, which destroys readability
Readability is on the eyes of the final user, they are free to use whatever narrow column width they prefer.
> - Guarantees monospace, so tabular data doesn't get all misaligned
When was the last time a computer shipped without a monospace font? This points at the rare occasion where there's a problem with the setup, but you could also argue that maybe there's a system with a broken image decompressor.
> Screenshots are incredibly useful for keeping code and terminal output looking like code and terminal output, and not getting completely mangled in an e-mail or chat message being read on a mobile device or in a narrow column.
Are you complaining about GMail's rendering maybe? Its awful[^0], but that's more of a GMail problem that could be solved if they wanted.
[^0]: Column width unbounded even on 4k monitors. Weird and inconsistent font sizes across different fonts (monospace is smaller). Reads poorly on phones too.
100% this. I fully disagree with the post - screenshots show context/colour/formatting etc that often doesn't even translate properly if you DO try to paste it into some IM or other "text swapping" application.
Sure, if you want someone to reproduce the text of course you'd send them actual text. But to show a problem, a picture is, as they say, worth 1000 words.
I feel like I've seen good solutions to both problems before, aren't there vscode extensions that let you just select the code and create a sharable link with all the view type options to appear everyone?
I think slack and other mail/chat clients rescale the image and apply aggressive compression on it. Sometimes they even crop the image or make it so that you need to scroll left and right. Also your syntax highlighting might be annoying to others and might make legibility worse for the receiver, and as other people pointed out most chat/mail clients support monospace code blocks. Plus I agree with all the things that the blog post author pointed out.
Yeah. OP has an egocentric bias - it’s not the norm in the world of work sharing that you can faithfully reproduce the live/contextual environment of the sender given the raw string.
(OP’s blog purports to be pertinent to freelance software development).
What about it? There simply isn't any information format that's both perfectly accessible and reproduces what you're seeing with perfect fidelity. In the happy path you can make the important parts match, but almost by definition, when someone's reporting an issue it's because what they "should" see and what they are seeing don't align.
My only use of code screenshot is to emulate the "take a look at my screen workflow". It's only meant for the other person to take a quick glance at. Anything further than that is transmitted as a code block or text file.
Except it doesn't use my preferred font, not my don't size, not my colors and I can't copy parts of it as easily and then the stupid chat app scales the image for some reason ot another.
Screenshots of text! Luxury! In my day, the screenshots were embedded in a Word document too.
But I can't be the only one appalled at the suggestion to use an LLM to parse the text. The sheer, prodigious waste of computing power, just to round-trip text to an image and back to text, when what's really missing is a computer user interface that makes it as simple to send text or other snippets as it is to send screenshots.
Pretty much every new programmer I’ve ever hired has done this in their first few weeks. Every time I have to tell them why it’s so unhelpful to share screenshots of text instead of just pasting the text. Usually they learn. When they don’t I usually end up firing them, not for that reason but for others.
The reason I personally hate it is I am often working from my phone. And it’s much easier to read text rendered properly than pinch zooming text in an image. What’s worse is slack will downgrade images for mobile and you can’t even pinch zoom in fully.
We use Slack and GitHub, so it's trivial to send formatted text or a file/line link and I basically never have to deal with screenshots of text. I guess this is just a nice reminder for me to be grateful.
OP here. My current team uses MS Teams. I've been teaching my colleagues how to create code blocks in Teams (basically, teaching them Markdown). It's there, but it's not readily discoverable.
Don’t even get me started. A colleague of mine made me screenshot a .env on a video call “for security” and I spent 30 min correcting OCR on it until it worked
My preference -- Link or attachment to the full document or code in context (if needed) ... along with screenshot of a relevant portion. (Many times the former is optional because there is enough context already.)
It is extra work to do both but I like to be through even when asking for help. Even if the other side doesn't need it -- because I myself might not remember all the nuances when I refer to that conversation later.
Also screenshot preserves (before any fixes) the exact way things looked when I confronted a certain situation. The visual of the screenshot serves as a much stronger reminder of that situation and my thinking ...way better than mere copy pasted text.
This article is a specific case of a more general piece of advice: ask questions well (provide context like clickable links, trim down your query to the minimal reproducible case, pose high-precision questions, etc.).
I agree that screenshots of text that are cut off from essential context are enough to make me pull my hair out, it creates so much extra work— but the modern feature of automatic text recognition in screenshots and images that allows for copy and paste has been incredible. Along with indexing that allows it to be searched, regular screenshots have become one of the most robust and future proof ways for me to preserve context from my workspace. When I look back into archived screenshots it helps me to recapture all kinds of things that I wouldn’t have thought to explicitly record.
I’m one of the people sending these screenshots, and glad to receive them. It helps jog my memory to see a screenshot from the original context, with syntax highlighting or especially log output from our besoke logging system for firmware. It’s very hard to read without the colors!
This is a lost cause, one of many that modern GUIs are responsible for. Just suck it up and deal with it because users are not going to stop sending you screenshots.
It seems there's great OCR available on Apple platforms, but to me it seems that we are giving ourselves a problem by properly attaching metadata where necessary.
I honestly thought this was going to be solved in the 2010s with the rise of comic-like memes, but we just kept sharing images with ever increasing compression artifacts as things were shared around and used to create new memes.
Agreed! The one that I really don't like is that social platforms promote / prefer screenshots of text. Search engines promote sites that link to themselves. All the good parts of URLs are missing. How often I see something interesting, just to realize it's a screenshot and I have to go dig around myself figuring out where it came from.
This is essentially a solved problem. Whenever someone sends me a screenshot that contains any text information (tables, etc), I pass it to an LLM and it correctly interprets the content of it. On modern versions of macOS you can just select text in images relatively painlessly, too.
Or just ask people not to send you data in useless formats. That way you don't have to burn an acre of trees to power it and you help someone be less difficult.
As described in the article, it isn't just text being image but that, usually, the image is only a subset of the entire text. Yes, OCR can help find the file containing a code segment in your local codebase but issues such as, mentioned in the article, sending a random error line rather the entire log remain.
Claude on Linux does it fine, so does cursor, codex, claude code, ollama etc. Not that I would use any of these for this; if someone sends me screenshot, it is relevant for me so I know where to find what is in it quite readily if needed at all.
When I get these, there's usually enough context that I can find the actual text.
That being said, I've had to twist some arms in a previous job for new employees attaching screenshots of a log viewer instead of the whole logs. The big problem was training: Once I made it very clear to the entire team that unedited logs were critical to solving problems, management made sure that all newcomers knew how to attach unedited logs.
Agreed. Sometimes context comes from more screen, not less. I receive a lot of cropped screenshots that show the “problem” but hide the surrounding context, and they often exclude things that would make a solution immediately clear from a shot of the full application window.
Preview on Mac does automatic OCR. I'm sure other tools exist that are similarly friction-free on other platforms, but it took me under 5 seconds (drag the image from the webpage into my downloads folder, click on it, and then select the relevant snippet and CMD+C to copy it).
I imagine I'd have similar frustrations if I couldn't copy-paste the text easily though!
> or (these days) get my coding agent to find the relevant module for me
????? Just OCR a line and paste it into the IDE’s search field???? Or, if for some baffling reason you don’t have the ability to OCR, just pick out a function declaration in the screenshot and search for that? We’re so doomed as a profession.
I think that Slack may be partially responsible for that.
If I copy code from PyCharm or VS Code and paste it into fucking Microsoft Word, even spawn-of-Satan-MS-Word-for-Mac respects most of my formatting. Plenty of web text editors are also able to do that.
But Slack, "The King of Useless Features Nobody Asked For", can't bother themselves to implement such a useful feature for their primary market.
gtk-vector-screenshot (<https://github.com/nomeata/gtk-vector-screenshot>) will do this, but for GTK apps only. It relies on a custom protocol layered on top of X Window, and I think traverses the tree of GTK widgets to create a vector representation. For a general screenshot program to work, I imagine it would need some sort of hook into every GUI framework used on your system.
Why though? This is a common problem which only requires one thing: empathy/politeness from the screenshot sender. They ask your for your help and attention, but they can't be bothered to not to waste your time. I think it's fair to point out that this is bad workplace behavior
This is astonishing. A screenshot is not only the least useful representation of a subject under discussion, it also requires more bandwidth than text.
I see two possible reasons for this -- the sender has no technical experience, or they're focused on making things more difficult for the recipient.
But when trying to decide between these two, I'm reminded of the saying, "Never attribute to malice that which can be adequately explained by stupidity."
This actually happened. A client wrote me, saying, "First, don't treat me like an idiot -- I have years of computer experience."
"Okay, I promise," I replied. "What's the problem?"
"Your program doesn't work."
"Can you be more specific?"
"I followed your instructions to the letter, but I see an error message."
"Okay, what is the error message?"
"It says, 'User [Enter your name here] is not found'."
I personally hate screenshots of kernel panics. Or anything else where you might be dealing with 64-bit hex addresses like "0xffffffff81b7ed80"
Typing that from a picture is infinitely more error prone than just cut/paste.
> I have to either very carefully type some of the code into a search box or (these days) get my coding agent to find the relevant module for me.
Your coding agent is not very smart if it can't deal with something as simple as OCR'ing an image and processing all the references in it, or letting you just select text from an image and searching or copying to the clipboard.
Like other commenters point out, automatic OCR on Apple platforms is a godsend, and it's such a great use of our modern AI capabilities that it should be a standard feature in every document viewer on every platform.
Another thing I wish was more common is metadata in screenshots, especially on phones. Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded (eg instagram.com/p/ABCD1234/). If I take a screenshot in the browser, include the URL that's being viewed (+ path to the DOM element in the viewport). If I take a screenshot in a maps app, include the bounding coordinates. If I take a screenshot in a PDF viewer, include a SHA1 hash of the document being viewed + offset in the document so that if I send the screenshot to someone else with the same document, it can seamlessly link to it. Etc etc.
There are probably privacy concerns to solve here, but no idea is new in computer science and I'm pretty sure some grad student somewhere has already explored the topic in depth (it just never made it to mainstream computing platforms).
It feels like screenshots have become the de facto common denominator in our mobile computing era, since platforms have abstracted files away from us. Lots of people who have only ever used phones as their main computing devices are confused when it comes to files, but everyone seems to understand screenshots.
Also, necessary shout out to Screenshot Conf! https://screenshot.arquipelago.org
OCR is a godsend, 100% agree. Not a fan of the metadata idea personally, 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing (like your examples of GPS coordinates for a maps app, url for browser) sounds like a privacy nightmare, and like something that will make a very reliable core feature much harder to use.
There are companies like Evernote/Zight/CloudApp that at one point tried some things like this, but they never really caught - I think because it's pretty easy to add annotations yourself or some note of your own - and a screenshot not "trying to do everything" is part of what makes them useful & ubiquitous.
> It feels like screenshots have become the de facto common denominator in our mobile computing era,
Google/Apple have taken notice. Both have recently redone their full-screen post-screenshot UI to include AI insights / automatic product searches / direct chat with Gemini/LLM / etc.
Its true everyone uses screenshots to save things they are interested in or want to look up / search more of / save for reason and this UI is the perfect place to insert themselves.
OP here. You raised a point that I should have mentioned in the article: screenshots of web pages that don't include the URL. I'm perfectly fine with screenshots of browser windows, since the context is almost always relevant. The system I work on right now puts a lot of useful context into the URL, but it's almost never included in the initial screenshot, so I have to ask for that. Of course, I generally ask for it as text so that I don't have to try to type the whole thing without making a mistake.
The ability to highlight/copy/etc text on Macs/iOS these days is such a killer feature. I use it almost every day, both for copying/translating text in screenshots or taking photos of text to then copy it into my notes later (eg school notice boards or event posters etc).
Part of what makes it so good is that it's everywhere. Preview, QuickLook, QuickTime Player (yes, videos get OCR'd too!), any app that uses the system frameworks for displaying media.
This includes Safari, where not only do images (inline or otherwise) have selectable text, but the built in translator leverages that text and uses it to translate the image, too! This is super useful for translating Japanese webpages in particular, which tend to have tons of text baked into images.
I have to say, the ability to quickly copy and paste between macbook and iphone is such a great flow
Totally agree. It’s one of those features that feels like magic. So handy for those digital purchase codes you get with blu-rays.
I use Shottr, I take a screenshot of a screenshot and hit “O” immediately after. Saves me from first saving the file to open it in the native viewer
I have Shottr keyboard shortcut (cmd+opt+control+o) setup to allow me to OCR from whatever is on the screen and copy the text to clipboard. So whether someone shares code or error log as screenshot on slack, it’s 3 steps: 1. cmd+opt+control+o 2. select the area to OCR 3. cmd+v in vscode or google
Windows built-in snipping tool (shortcut Win + Shift + S) also has a text actions button to extract text.
this. makes me wish more image viewers would ocr->png special field->have location-attached selectable text like a pdf
OneNote had this for a long time.
Aside from copying text from images, OneNote can also make text in images searchable.
I disagree. I use screenshots all the time, because it:
- Preserves the full 80 character width without line-wrapping, which destroys readability
- Guarantees monospace, so tabular data doesn't get all misaligned
- Preserves a good coding font, so it doesn't come out as some hairline-width Courier on the other end
- Preserves syntax highlighting, very helpful
Obviously if somebody needs a whole file or whole log, then send the whole thing as an attachment. But very often I'll still include a screenshot of the relevant part. With line numbers, it's not difficult to jump to the right part of the attached file.
Screenshots are incredibly useful for keeping code and terminal output looking like code and terminal output, and not getting completely mangled in an e-mail or chat message being read on a mobile device or in a narrow column.
Key things required for posting to the chat: people reading can read it, people reading can copy and paste it, and people searching can actually find it. It doesn't need to exactly match what you might see in a text editor. Anybody wanting to look at the actual text in context won't be doing it in the chat, but will rather be opening the file of interest in the appropriate tool, and examining it that way; anybody stuck reading the text only in the chat is probably on their phone or something and will be best served by being able to easily see all of it.
For reading purposes, the question of screen width is best left to the reader. They will have the window set to their preferred width, possibly limited by screen size. If the text has to wrap, so be it. It's better that than having to try to squint at your 3713x211 screen grab on an iPhone (portrait orientation). Also bear in mind that even the most basic of font and colour choices (large/small font, dark/light mode) can cause accessibility issues for some readers.
For copying and pasting purposes, images suck. Yes, macOS can do it, sort of, and I expect Windows 11 can do it too, probably to about the same extent. But it's not as easy as having the text right there in copyable form.
For searching purposes, ditto - only worse, because at least when you copy and paste and it comes out wrong, you'll notice. When you search: you just won't find the thing. You'll never know.
> - Preserves the full 80-character width without line-wrapping, which destroys readability
Readability is on the eyes of the final user, they are free to use whatever narrow column width they prefer.
> - Guarantees monospace, so tabular data doesn't get all misaligned
When was the last time a computer shipped without a monospace font? This points at the rare occasion where there's a problem with the setup, but you could also argue that maybe there's a system with a broken image decompressor.
> Screenshots are incredibly useful for keeping code and terminal output looking like code and terminal output, and not getting completely mangled in an e-mail or chat message being read on a mobile device or in a narrow column.
Are you complaining about GMail's rendering maybe? Its awful[^0], but that's more of a GMail problem that could be solved if they wanted.
[^0]: Column width unbounded even on 4k monitors. Weird and inconsistent font sizes across different fonts (monospace is smaller). Reads poorly on phones too.
100% this. I fully disagree with the post - screenshots show context/colour/formatting etc that often doesn't even translate properly if you DO try to paste it into some IM or other "text swapping" application.
Sure, if you want someone to reproduce the text of course you'd send them actual text. But to show a problem, a picture is, as they say, worth 1000 words.
I feel like I've seen good solutions to both problems before, aren't there vscode extensions that let you just select the code and create a sharable link with all the view type options to appear everyone?
e.g. https://snippetshare.dev/
I think slack and other mail/chat clients rescale the image and apply aggressive compression on it. Sometimes they even crop the image or make it so that you need to scroll left and right. Also your syntax highlighting might be annoying to others and might make legibility worse for the receiver, and as other people pointed out most chat/mail clients support monospace code blocks. Plus I agree with all the things that the blog post author pointed out.
Yeah. OP has an egocentric bias - it’s not the norm in the world of work sharing that you can faithfully reproduce the live/contextual environment of the sender given the raw string.
(OP’s blog purports to be pertinent to freelance software development).
What about accessibility?
What about it? There simply isn't any information format that's both perfectly accessible and reproduces what you're seeing with perfect fidelity. In the happy path you can make the important parts match, but almost by definition, when someone's reporting an issue it's because what they "should" see and what they are seeing don't align.
My only use of code screenshot is to emulate the "take a look at my screen workflow". It's only meant for the other person to take a quick glance at. Anything further than that is transmitted as a code block or text file.
```
Is widely supported to add code. E.g. in Slack, Confluence...
Both examples you gave have pretty rough or nonexistent syntax highlighting support.
Even Google chat can do it.
Slack seems to always wrap code blocks. It makes python particularly shit to read.
I genuinely thought this was a satire until I read `Preserves syntax highlighting, very helpful`.
Except it doesn't use my preferred font, not my don't size, not my colors and I can't copy parts of it as easily and then the stupid chat app scales the image for some reason ot another.
> Preserves a good coding font, so it doesn't come out as some hairline-width Courier on the other end
Let me introduce you to Putty users who never change the default font...
See, imo this is why having a good embedding for code is so important. The best of both worlds is available.
Screenshots of text! Luxury! In my day, the screenshots were embedded in a Word document too.
But I can't be the only one appalled at the suggestion to use an LLM to parse the text. The sheer, prodigious waste of computing power, just to round-trip text to an image and back to text, when what's really missing is a computer user interface that makes it as simple to send text or other snippets as it is to send screenshots.
Pretty much every new programmer I’ve ever hired has done this in their first few weeks. Every time I have to tell them why it’s so unhelpful to share screenshots of text instead of just pasting the text. Usually they learn. When they don’t I usually end up firing them, not for that reason but for others.
The reason I personally hate it is I am often working from my phone. And it’s much easier to read text rendered properly than pinch zooming text in an image. What’s worse is slack will downgrade images for mobile and you can’t even pinch zoom in fully.
Saving this for later: https://imgur.com/a/4sLznnY
We use Slack and GitHub, so it's trivial to send formatted text or a file/line link and I basically never have to deal with screenshots of text. I guess this is just a nice reminder for me to be grateful.
I mostly see this in Teams, and I can't really blame the sender because Teams' support for code blocks is so horrible.
OP here. My current team uses MS Teams. I've been teaching my colleagues how to create code blocks in Teams (basically, teaching them Markdown). It's there, but it's not readily discoverable.
When I see a text screenshot in Teams, it's typically a snippet of a conversation in a different Teams chat.
Don’t even get me started. A colleague of mine made me screenshot a .env on a video call “for security” and I spent 30 min correcting OCR on it until it worked
My preference -- Link or attachment to the full document or code in context (if needed) ... along with screenshot of a relevant portion. (Many times the former is optional because there is enough context already.)
It is extra work to do both but I like to be through even when asking for help. Even if the other side doesn't need it -- because I myself might not remember all the nuances when I refer to that conversation later.
Also screenshot preserves (before any fixes) the exact way things looked when I confronted a certain situation. The visual of the screenshot serves as a much stronger reminder of that situation and my thinking ...way better than mere copy pasted text.
This article is a specific case of a more general piece of advice: ask questions well (provide context like clickable links, trim down your query to the minimal reproducible case, pose high-precision questions, etc.).
I agree that screenshots of text that are cut off from essential context are enough to make me pull my hair out, it creates so much extra work— but the modern feature of automatic text recognition in screenshots and images that allows for copy and paste has been incredible. Along with indexing that allows it to be searched, regular screenshots have become one of the most robust and future proof ways for me to preserve context from my workspace. When I look back into archived screenshots it helps me to recapture all kinds of things that I wouldn’t have thought to explicitly record.
I’m one of the people sending these screenshots, and glad to receive them. It helps jog my memory to see a screenshot from the original context, with syntax highlighting or especially log output from our besoke logging system for firmware. It’s very hard to read without the colors!
Note that Mathpix Snip can quickly convert such screen shots to markdown code via keyboard shortcut. Disclaimer: I’m the founder.
This is a lost cause, one of many that modern GUIs are responsible for. Just suck it up and deal with it because users are not going to stop sending you screenshots.
It seems there's great OCR available on Apple platforms, but to me it seems that we are giving ourselves a problem by properly attaching metadata where necessary.
I honestly thought this was going to be solved in the 2010s with the rise of comic-like memes, but we just kept sharing images with ever increasing compression artifacts as things were shared around and used to create new memes.
Agreed! The one that I really don't like is that social platforms promote / prefer screenshots of text. Search engines promote sites that link to themselves. All the good parts of URLs are missing. How often I see something interesting, just to realize it's a screenshot and I have to go dig around myself figuring out where it came from.
On Microsoft Teams, I will use alt-prtscn to share an image of a terminal.
I am hyper-sensitive to emailing terminal screenshots in MS Outlook, as they cannot be searched.
This is essentially a solved problem. Whenever someone sends me a screenshot that contains any text information (tables, etc), I pass it to an LLM and it correctly interprets the content of it. On modern versions of macOS you can just select text in images relatively painlessly, too.
Linux desktop users will get there one day.
Or just ask people not to send you data in useless formats. That way you don't have to burn an acre of trees to power it and you help someone be less difficult.
I'm sure they will send you well written, accurate documentation if you ask, too...
I'm absolutely sure they won't if you don't.
As described in the article, it isn't just text being image but that, usually, the image is only a subset of the entire text. Yes, OCR can help find the file containing a code segment in your local codebase but issues such as, mentioned in the article, sending a random error line rather the entire log remain.
Claude on Linux does it fine, so does cursor, codex, claude code, ollama etc. Not that I would use any of these for this; if someone sends me screenshot, it is relevant for me so I know where to find what is in it quite readily if needed at all.
Another way it's solved is that clipboards work on text too.
When I get these, there's usually enough context that I can find the actual text.
That being said, I've had to twist some arms in a previous job for new employees attaching screenshots of a log viewer instead of the whole logs. The big problem was training: Once I made it very clear to the entire team that unedited logs were critical to solving problems, management made sure that all newcomers knew how to attach unedited logs.
My response is usually “need context”. No shame in making them fill in the gaps they created in the first place
Nothing against screenshots unless they are lacking context
Agreed. Sometimes context comes from more screen, not less. I receive a lot of cropped screenshots that show the “problem” but hide the surrounding context, and they often exclude things that would make a solution immediately clear from a shot of the full application window.
> I have to either very carefully type some of the code into a search box or (these days) get my coding agent to find the relevant module for me.
What about just asking them what file that is?
Some people are too proud for their own good about their aesthetic choices in editor colors, typography and fonts.
I do enjoy seeing what themes others are using.
Preview on Mac does automatic OCR. I'm sure other tools exist that are similarly friction-free on other platforms, but it took me under 5 seconds (drag the image from the webpage into my downloads folder, click on it, and then select the relevant snippet and CMD+C to copy it).
I imagine I'd have similar frustrations if I couldn't copy-paste the text easily though!
I secretly enjoyed the lectures out people would get on StackOverflow when they did this.
This is why I send both a screenshot for easy conveyance of syntax highlighting and such, and a link to the code.
> or (these days) get my coding agent to find the relevant module for me
????? Just OCR a line and paste it into the IDE’s search field???? Or, if for some baffling reason you don’t have the ability to OCR, just pick out a function declaration in the screenshot and search for that? We’re so doomed as a profession.
This poor man is in a losing battle with modern computing.
I think that Slack may be partially responsible for that.
If I copy code from PyCharm or VS Code and paste it into fucking Microsoft Word, even spawn-of-Satan-MS-Word-for-Mac respects most of my formatting. Plenty of web text editors are also able to do that.
But Slack, "The King of Useless Features Nobody Asked For", can't bother themselves to implement such a useful feature for their primary market.
Code block is easy enough to create with ```. I've never have any issue with that.
Not the same, it is not just paste and forget, if you want syntax highlighting need to add a snippet.
This made me chuckle multiple times. Strong agree, Paul.
we need a way to make screenshot while make text copy-pastable.
SVG maybe?
gtk-vector-screenshot (<https://github.com/nomeata/gtk-vector-screenshot>) will do this, but for GTK apps only. It relies on a custom protocol layered on top of X Window, and I think traverses the tree of GTK widgets to create a vector representation. For a general screenshot program to work, I imagine it would need some sort of hook into every GUI framework used on your system.
OP needs to relax
Why though? This is a common problem which only requires one thing: empathy/politeness from the screenshot sender. They ask your for your help and attention, but they can't be bothered to not to waste your time. I think it's fair to point out that this is bad workplace behavior
This is astonishing. A screenshot is not only the least useful representation of a subject under discussion, it also requires more bandwidth than text.
I see two possible reasons for this -- the sender has no technical experience, or they're focused on making things more difficult for the recipient.
But when trying to decide between these two, I'm reminded of the saying, "Never attribute to malice that which can be adequately explained by stupidity."
This actually happened. A client wrote me, saying, "First, don't treat me like an idiot -- I have years of computer experience."
"Okay, I promise," I replied. "What's the problem?"
"Your program doesn't work."
"Can you be more specific?"
"I followed your instructions to the letter, but I see an error message."
"Okay, what is the error message?"
"It says, 'User [Enter your name here] is not found'."
I personally hate screenshots of kernel panics. Or anything else where you might be dealing with 64-bit hex addresses like "0xffffffff81b7ed80" Typing that from a picture is infinitely more error prone than just cut/paste.
just a dumb sales guy - but I assumed when people were asking about code they would always copy paste it.
... Is this really common?
> I have to either very carefully type some of the code into a search box or (these days) get my coding agent to find the relevant module for me.
Your coding agent is not very smart if it can't deal with something as simple as OCR'ing an image and processing all the references in it, or letting you just select text from an image and searching or copying to the clipboard.