FFMpeg is one of those tools that is really quite hard to use. The sheer surface area of the possible commands and options is incredible and then there is so much arcane knowledge around the right settings. Its defaults aren't very good and lead to poor quality output in a lot of cases and you can get some really weird errors when you combine certain settings. Its an amazingly capable tool but its equipped with every foot gun going.
ffmpeg has abysmal defaults. I've always been of the opinion that CLI utilities should have sane defaults useful to a majority of users. As someone who has used ffmpeg for well over a decade, I find it baffling that you have to pass so many arguments to get an even remotely usable result
For certain file formats, it's true (e.g. gif), but I gotta say- I use "ffmpeg -i input.mov output.mp4" after taking a video on mac, and it looks good and is a tiny fraction (sometimes 100x smaller) of the size.
Same! And I was pleasantly surprised by this working well without any additional parameters.
But I'd also confirm the other comments after going through the steps for shrinking a longer screen recording to <2.5MB with acceptable quality, and cropping some portion of the screen.
I needed a tutorial in addition to the built-in help pages to get it working.
It was a little bit fun almost to try&error my way through combinations of quality and cropping options, but sure, time consuming.
I have to say, I mostly like FFMPEGs approach. Anyone can build anything on top of it, like GUIs.
"Good" defaults can cause an explosion of complexity when providing many different options and allowing all technically feasible combinations.
There's also room for some kind of improved CLI I guess, but many possibilities always mean complex options. So this is probably easier said than done.
It does seem to have pretty good defaults in the MOV MP4 transcoding case.
it should really just have an interactive mode that supports batching. It would cover 99% of use cases.
I recommend everyone ITT to just use Handbrake (a GUI) unless they have extremely niche use cases. What's the point of using a LLM? You just need one person who knows ffmpeg better than you to write a GUI. And someone did. So use that.
If Handbrake doesn't solve your problem please just go to Stack Overflow. The LLM was trained there anyway, and your use case is not novel.
Yeah there's some default production-ready presets for widely-compatible MP4s which I use every time I need to edit on Vegas Pro. In the "Video" tab there's also a "fast decode" toggle which is useful to me.
Never had any issues since I switched to this particular workflow. Vegas (and I presume most editing software) is particularly anal about formats, especially when you need real time previews.
You can always add some extra command line options if you need to. It's just much easier to work with a GUI when the system is as complex as ffmpeg.
I'd say another big tip is getting proper ffmpeg completion into your shell. That's helpful for seeing a list of all possible encoders, pixel formats, etc.
I also found that playing around with filters in mpv was a great what to learn the ffmpeg filter expression language!
It's good that you have a "read" statement to force confirmation by the user of the command, but all it takes is one errant accidental enter to end up running arbitrary code returned from the LLM.
I'd constrain the tool to only run "ffmpeg" and extract the options/parameters from the LLM instead.
I finished shellmind (https://github.com/wintermute-cell/shellmind) a few days ago, and it might interest you! It avoids having to copy-paste commands, by integrating directly into the shell and let's you review the real command before send-off. It's also general purpose and can handle more then just ffmpeg.
Coincidentally I made a shell script (https://github.com/pgodschalk/dotfiles/blob/main/bin/ai) around the same time, though it doesn't put the command in the buffer. I might borrow that idea. One thing I did add though, and would recommend, is an --explain flag. Usually if I don't know e.g. ffmpeg, I tend to want a short overview of what I'm actually about to run.
The system prompt may be a bit too simple, especially when using gpt-4o-mini as the base LLM that doesn't adhere to prompts well.
> You write ffmpeg commands based on the description from the user. You should only respond with a command line command for ffmpeg, never any additional text. All responses should be a single line without any line breaks.
I recently tried to get Claude 3.5 Sonnet to solve an FFmpeg problem (write a command to output 5 equally-time-spaced frames from a video) with some aggressive prompt engineering and while it seems internally consistent, I went down a rabbit hole trying to figure out why it didn't output anything, as the LLMs assume integer frames-per-second which is definitely not the case in the real world!
I asked your question across multiple LLMs and had them reviewed by multiple LLMs. DeepSeek Chat said Claude 3.5 Sonnet produced an invalid command. Here is my chat.
Scroll to the bottom on the left window to see that Claude acknowledges that the command that DeepSeek produced was accurate. In the right window, you'll find the conversation I had with DeepSeek chat about all the commands.
I then asked all the models again if the DeepSeek generated command was correct and they all said no. And when I asked them to compare all the "correct" commands, Sonnet and DeepSeek said Sonnet was the accurate one:
That command did not work but I got the impression that DeepSeek could probably get me a working solution, so after telling it the errors I keep getting, it got to a point where it could write a bash script for me to get 5 equally spaced frames.
I guess the long story short is, changing the prompt probably won't be enough and you will need to constantly shop around to see which LLM will most likely give the correct response based on the question you are asking.
Yeah, it is crazy how confidently LLMs can say something when it has never existed. Having said that, I'm still a HUGE fan of LLMs, as I know it is very unlikely that multiple LLMs will brain fart at the same time. If you know how to navigate things, you will get a solution much faster than you probably would have in the past.
As a user it feels that you get cosy with stuff they know and you gain a lot of time until you hit something they don't and you lose more time than the sum you gain from the beginning because finally you have to learn everything and more to be able to understand how the LLM put you on the wrong track.
That is why I always go in with a mistrust mindset and why I am building my chat app this way. If accuracy is important and if I am unfamiliar with something, I mainly use LLMs as a compass and rely on them to tell me when another LLM (including itself) is wrong. I'm pretty sure I will learn the wrong things over time, but these wrong things in my mind are not critical.
i think this type of interaction is the future in lots of areas. i can imagine we replace API's completely with a single endpoint where you hit it up with a description of what you want back. like, hit up 'news.ycombinator.com/api' with "give me all the highest rated submissions over the past week about LLMs". a server side LLM translates that to SQL, executes the query, returns the results.
this approach is broadly applicable to lots of domains just like FFMpeg. very very cool to see things moving in this direction.
HN and internet forums in general have a contagion of critique, where we mercilessly point out flaws and attempt to show our superiority. It best to ignore them.
> I ask the LLM to build it. That way, by definition, the LLM has a built in understanding of how the system should work, because the LLM itself invented it.
I share the same belief, and as a rebuttal against EagnaIonat's comment, when you ask the LLM to create something, it is finding the centroid of the latent space of your request in its high dimensional space. The output is congruent with what it knows and believes. So yes, the output would be statistical, but is also embedded in its subspace. For code you have written independent of the LLM, that isn't necessarily true.
I think there are many ways we could test this, even in smaller models through constructed tests and reprojection of output programs.
It is like if I asked an OO programmer to come up with a purely functional solution, it would be hard. And then if I asked to take an existing PFP program and refactor and extend it, it would be broken.
Solutions have to exist in the natural space, this is true for everyone.
The big protocol doing this is called "Model Context Protocol" and it should've been a widely read/discussed post except hn has taken a wide anti-ai stance
Except you don't need an LLM to do any of this, and it's already computationally cheaper. If you don't know the results you want, you should figure that out first, instead of asking a Markov chain to do it.
I believe this approach is destined for a lot of disappointment. LLMs enable a LOT of entry- and mid-level performance, quickly. Rightfully, you and I worry about the edge cases and bugs. But people will trend towards things that enable them to do things faster.
xx ffmpeg video1.mp4 normalize audio without reencoding video to video2.mp4
And have sensible defaults. Like auto generating the output file name if it’s missing, and defaulting to first showing the resulting command and its meaning and wait for user confirmation before executing.
• Uses -c:v copy to stream copy the video (no re-encoding)
• Applies loudnorm filter to normalize audio
• Re-encodes only the audio using AAC codec
• Outputs to video2.mp4
[...]
Run shell command? (Y)es/(N)o/(D)on't ask again [Yes]:
Indeed it should support all commands. ffmpeg shouldn't even be relevant, that's just an implementation detail. If a command is missing, it should be installed.
Just tell the computer what you want, and it figures out how to do it. Isn't that the dream?
I think the logical conclusion here is replacing the shell with GPT. It might not be a good idea — yet — but it's certainly possible already.
There are already bash replacements and cli tools doing this. The main thing here is how acerbic the anti-AI luddites that prevent the knowledge of this stuff from propagating
Parsing simple English and converting it to ffmpeg commands can be done without an LLM, running locally, using megabytes of RAM.
Check out this AI:
$ apt install cdecl
[ ... ]
After this operation, 62.5 kB of additional disk space will be used.
[ ... ]
$ cdecl
Type `help' or `?' for help
cdecl> declare foo as function (pointer to char) returning pointer to array 4 of pointer to function (double) returning double
double (*(*foo(char *))[4])(double )
Granted, this one has a very rigid syntax that doesn't allow for variation, but it could be made more flexible.
If FFMpeg's command line bugged me badly enough, I'd write "ffdecl".
> Granted, this one has a very rigid syntax that doesn't allow for variation, but it could be made more flexible.
That’s kind of the killer feature of an LLM. You don’t even need to have your fingers on the right place on the keyboard and it will parse gibberish correctly as long as it’s shifted consistently.
But that effectively takes Postel's ill-conceived law to a ridiculous degree.
Programs should precisely define what their inputs are and loudly reject all else.
Moreover, for this misfeature, you have to use a cloud API, where your syntax is analyzed by some massive cluster, using scads of processing and memory resources.
We could have a natural language command line for FFMpeg requiring at most megabytes (probably just kilobytes) that would work on an air-gapped machine.
In the early 70's, the SHRDLU project achieved amazing chat interaction with symbolic processing, on the hardware available then. It was a way more impressive hack than LLM. Not just because it required relatively low resources, but also because its author could actually explain its responses, and point to the responsible pieces of code behind them, which he designed.
Well, firstly, take a look at some tech news, because it sure looks to me like people left and right are interfacing programs with AI, desperately scrambling to get reliable, stable outputs out of their prompts.
Secondly, people need precision and stability in program behavior also. People learn that when they give a certain command, they get a certain effect, and they don't want it to break.
Interfaces that allow people to express some wish in many ways (language, direct manipulation, ...) can still be rigidly defined: appear flexible to the casual user, but rigid to someone who cares to study the several thousand rules under the hood (or whatever).
But we aren't talking about "interfacing programs with AI", and we aren't talking about "[giving] a certain command, [getting] a certain effect". We are talking about one-off invocations directly by a human. Tools like llmpeg aren't intended to be a component in your CI/CD scripts, but for the command line user who wants to casually convert some media files and who hasn't memorized ffmpeg's full CLI interface.
I don't know about ffslice, but you can get frame-perfect slicing with minimal reencoding via LosslessCut's experimental "smart cut" feature[2] or Smart Media Cutter's[3] smartcut[4].
For some reason, when ffmpeg reencodes from 23.97fps h264 to the same fps and codec, the result looks choppy, like the shutter speed was halved or something. The smart lossless encoding you mentioned helps a lot here.
I am eagerly waiting for software test frameworks to adapt LLM where I can simply write test cases as easy as - "Open the website, login using these credentials, click the logout button, go back to the previous page, and check if the user is not logged in" - and let the LLM do the job.
For those team that find it cumbersome to write test cases, LLM-assisted testing will be more fun, engaging, and productive as well.
FFmpeg is a tool that I now use purely with LLM help (and it is the only such tool for me). I do however want to read the explanation of what the AI-suggested command does and understand it instead of just YOLO running it like in this project.
I have had the experience where GPT/LLAMA suggested parameters that would have produced unintended consequences and if I haven't read their explanation I would never know (resulting in e.g. a lower quality video).
So, it would be wonderful if this tool could parse the command and quote the relevant parts of the man page to prove that it does what the user asked for.
I'd probably use GitHub's `??` CLI or `llm-term` that already this without needing to install a purpose-specific tool. Do you provide any specific value add on top of these?
Probably the fact that the AI has only access to the ffmpeg command is a value itself. Supervision is much less needed vs something that could hallucinate using rm -rf on the wrong place
I think an important point is avoiding having to copy-paste the resulting command. A few days ago I finished shellmind (https://github.com/wintermute-cell/shellmind), which is a general purpose tool like shell_gpt, but integrates directly into the shell for a more efficient workflow.
Neither of the tools I listed require copy-pasting the resulting command. They show me the generated command, and I either agree to run it or not by hitting "y" or Enter. They both suck at adding the resulting command to history, though.
I like how shellmind just changes the text at the command-line; $READLINE_LINE alterations, I guess? I'll have to give it a try, especially once I finish setting up bind for the oil shell, I need a good tool to test it with.
Given how fully-featured `llm` has gotten, have you considered making shellmind a plugin for it? That would enable access to way more models. Just a thought.
Ah, thank you for the correction. It's been quite a while since I used shell_gpt and things seem to have changed; I need to revisit these tools :)
Your plugin suggestion sounds interesting, I'll consider it!
FFMpeg is one of those tools that is really quite hard to use. The sheer surface area of the possible commands and options is incredible and then there is so much arcane knowledge around the right settings. Its defaults aren't very good and lead to poor quality output in a lot of cases and you can get some really weird errors when you combine certain settings. Its an amazingly capable tool but its equipped with every foot gun going.
ffmpeg has abysmal defaults. I've always been of the opinion that CLI utilities should have sane defaults useful to a majority of users. As someone who has used ffmpeg for well over a decade, I find it baffling that you have to pass so many arguments to get an even remotely usable result
For certain file formats, it's true (e.g. gif), but I gotta say- I use "ffmpeg -i input.mov output.mp4" after taking a video on mac, and it looks good and is a tiny fraction (sometimes 100x smaller) of the size.
Same! And I was pleasantly surprised by this working well without any additional parameters.
But I'd also confirm the other comments after going through the steps for shrinking a longer screen recording to <2.5MB with acceptable quality, and cropping some portion of the screen.
I needed a tutorial in addition to the built-in help pages to get it working.
It was a little bit fun almost to try&error my way through combinations of quality and cropping options, but sure, time consuming.
I have to say, I mostly like FFMPEGs approach. Anyone can build anything on top of it, like GUIs.
"Good" defaults can cause an explosion of complexity when providing many different options and allowing all technically feasible combinations.
There's also room for some kind of improved CLI I guess, but many possibilities always mean complex options. So this is probably easier said than done.
It does seem to have pretty good defaults in the MOV MP4 transcoding case.
it should really just have an interactive mode that supports batching. It would cover 99% of use cases.
I recommend everyone ITT to just use Handbrake (a GUI) unless they have extremely niche use cases. What's the point of using a LLM? You just need one person who knows ffmpeg better than you to write a GUI. And someone did. So use that.
If Handbrake doesn't solve your problem please just go to Stack Overflow. The LLM was trained there anyway, and your use case is not novel.
I used RazorLame back in the day and then MediaCoder for a decade. Then I found out MediaCoder uses ffmpeg!
The main thing I do with ffmpeg is make highly compatible MP4s because some devices can't handle some MP4s.
ffmpeg -i input.mp4 -c:v libx264 -profile:v baseline -level 3.0 -pix_fmt yuv420p -movflags faststart outut.mp4
If I can make a Handbrake preset for that, it might save me a tiny bit of hassle.
Yeah there's some default production-ready presets for widely-compatible MP4s which I use every time I need to edit on Vegas Pro. In the "Video" tab there's also a "fast decode" toggle which is useful to me.
Never had any issues since I switched to this particular workflow. Vegas (and I presume most editing software) is particularly anal about formats, especially when you need real time previews.
You can always add some extra command line options if you need to. It's just much easier to work with a GUI when the system is as complex as ffmpeg.
The basics aren't that hard to remember. I posted this here a couple of days ago in another ffmpeg thread:
https://news.ycombinator.com/item?id=42708088
great intro guide!
I'd say another big tip is getting proper ffmpeg completion into your shell. That's helpful for seeing a list of all possible encoders, pixel formats, etc.
I also found that playing around with filters in mpv was a great what to learn the ffmpeg filter expression language!
It's good that you have a "read" statement to force confirmation by the user of the command, but all it takes is one errant accidental enter to end up running arbitrary code returned from the LLM.
I'd constrain the tool to only run "ffmpeg" and extract the options/parameters from the LLM instead.
I finished shellmind (https://github.com/wintermute-cell/shellmind) a few days ago, and it might interest you! It avoids having to copy-paste commands, by integrating directly into the shell and let's you review the real command before send-off. It's also general purpose and can handle more then just ffmpeg.
Coincidentally I made a shell script (https://github.com/pgodschalk/dotfiles/blob/main/bin/ai) around the same time, though it doesn't put the command in the buffer. I might borrow that idea. One thing I did add though, and would recommend, is an --explain flag. Usually if I don't know e.g. ffmpeg, I tend to want a short overview of what I'm actually about to run.
Seems to be a common thing. I did something similar two days ago. Pretty helpful so far: https://github.com/combatwombat/cmdline-ai-helper
The system prompt may be a bit too simple, especially when using gpt-4o-mini as the base LLM that doesn't adhere to prompts well.
> You write ffmpeg commands based on the description from the user. You should only respond with a command line command for ffmpeg, never any additional text. All responses should be a single line without any line breaks.
I recently tried to get Claude 3.5 Sonnet to solve an FFmpeg problem (write a command to output 5 equally-time-spaced frames from a video) with some aggressive prompt engineering and while it seems internally consistent, I went down a rabbit hole trying to figure out why it didn't output anything, as the LLMs assume integer frames-per-second which is definitely not the case in the real world!
I asked your question across multiple LLMs and had them reviewed by multiple LLMs. DeepSeek Chat said Claude 3.5 Sonnet produced an invalid command. Here is my chat.
https://beta.gitsense.com/?chats=197c53ab-86e9-43d3-92dd-df8...
Scroll to the bottom on the left window to see that Claude acknowledges that the command that DeepSeek produced was accurate. In the right window, you'll find the conversation I had with DeepSeek chat about all the commands.
I then asked all the models again if the DeepSeek generated command was correct and they all said no. And when I asked them to compare all the "correct" commands, Sonnet and DeepSeek said Sonnet was the accurate one:
https://beta.gitsense.com//?chat=47183567-c1a6-4ad5-babb-9bb...
That command did not work but I got the impression that DeepSeek could probably get me a working solution, so after telling it the errors I keep getting, it got to a point where it could write a bash script for me to get 5 equally spaced frames.
I guess the long story short is, changing the prompt probably won't be enough and you will need to constantly shop around to see which LLM will most likely give the correct response based on the question you are asking.
So that last one is a hallucination: there's no `n_frames` variable for the select filter: https://ffmpeg.org/ffmpeg-filters.html#select_002c-aselect
At the least, I learnt a lot about how FFmpeg works.
Yeah, it is crazy how confidently LLMs can say something when it has never existed. Having said that, I'm still a HUGE fan of LLMs, as I know it is very unlikely that multiple LLMs will brain fart at the same time. If you know how to navigate things, you will get a solution much faster than you probably would have in the past.
As a user it feels that you get cosy with stuff they know and you gain a lot of time until you hit something they don't and you lose more time than the sum you gain from the beginning because finally you have to learn everything and more to be able to understand how the LLM put you on the wrong track.
The black swan for LLM in a sense.
That is why I always go in with a mistrust mindset and why I am building my chat app this way. If accuracy is important and if I am unfamiliar with something, I mainly use LLMs as a compass and rely on them to tell me when another LLM (including itself) is wrong. I'm pretty sure I will learn the wrong things over time, but these wrong things in my mind are not critical.
i think this type of interaction is the future in lots of areas. i can imagine we replace API's completely with a single endpoint where you hit it up with a description of what you want back. like, hit up 'news.ycombinator.com/api' with "give me all the highest rated submissions over the past week about LLMs". a server side LLM translates that to SQL, executes the query, returns the results.
this approach is broadly applicable to lots of domains just like FFMpeg. very very cool to see things moving in this direction.
Do you envision the LLMs creating a protocol? Would the caller supply the schema for the response?
I mentioned here recently that I let LLMs design the APIs which they are going to use. I got quite a negative response to that, which surprised me.
I see it here https://news.ycombinator.com/item?id=42548228
HN and internet forums in general have a contagion of critique, where we mercilessly point out flaws and attempt to show our superiority. It best to ignore them.
> I ask the LLM to build it. That way, by definition, the LLM has a built in understanding of how the system should work, because the LLM itself invented it.
I share the same belief, and as a rebuttal against EagnaIonat's comment, when you ask the LLM to create something, it is finding the centroid of the latent space of your request in its high dimensional space. The output is congruent with what it knows and believes. So yes, the output would be statistical, but is also embedded in its subspace. For code you have written independent of the LLM, that isn't necessarily true.
I think there are many ways we could test this, even in smaller models through constructed tests and reprojection of output programs.
It is like if I asked an OO programmer to come up with a purely functional solution, it would be hard. And then if I asked to take an existing PFP program and refactor and extend it, it would be broken.
Solutions have to exist in the natural space, this is true for everyone.
The big protocol doing this is called "Model Context Protocol" and it should've been a widely read/discussed post except hn has taken a wide anti-ai stance
Imagine that every API will be behind government gateway, checking all the queries before passing on to the real API and then checking its replies.
Except you don't need an LLM to do any of this, and it's already computationally cheaper. If you don't know the results you want, you should figure that out first, instead of asking a Markov chain to do it.
I believe this approach is destined for a lot of disappointment. LLMs enable a LOT of entry- and mid-level performance, quickly. Rightfully, you and I worry about the edge cases and bugs. But people will trend towards things that enable them to do things faster.
This should be a terminal utility.
And have sensible defaults. Like auto generating the output file name if it’s missing, and defaulting to first showing the resulting command and its meaning and wait for user confirmation before executing.aider https://aider.chat/ basically does this already.
% aider
[...]
> do this on command line: ffmpeg cat\ meets\ skunk.mp4 normalize audio without reencoding video to video2.mp4
To normalize the audio without re-encoding the video, we'll use ffmpeg's loudnorm filter and stream copy for video. Here's the command:
ffmpeg -i "cat meets skunk.mp4" -c:v copy -af loudnorm -c:a aac video2.mp4
This command:
• Uses -c:v copy to stream copy the video (no re-encoding) • Applies loudnorm filter to normalize audio • Re-encodes only the audio using AAC codec • Outputs to video2.mp4
[...]
Run shell command? (Y)es/(N)o/(D)on't ask again [Yes]:
Indeed it should support all commands. ffmpeg shouldn't even be relevant, that's just an implementation detail. If a command is missing, it should be installed.
Just tell the computer what you want, and it figures out how to do it. Isn't that the dream?
I think the logical conclusion here is replacing the shell with GPT. It might not be a good idea — yet — but it's certainly possible already.
There are already bash replacements and cli tools doing this. The main thing here is how acerbic the anti-AI luddites that prevent the knowledge of this stuff from propagating
Parsing simple English and converting it to ffmpeg commands can be done without an LLM, running locally, using megabytes of RAM.
Check out this AI:
Granted, this one has a very rigid syntax that doesn't allow for variation, but it could be made more flexible.If FFMpeg's command line bugged me badly enough, I'd write "ffdecl".
> Granted, this one has a very rigid syntax that doesn't allow for variation, but it could be made more flexible.
That’s kind of the killer feature of an LLM. You don’t even need to have your fingers on the right place on the keyboard and it will parse gibberish correctly as long as it’s shifted consistently.
I tell Claude to do things like I have brainrot and it still understands me like "ok, gib fn innew codblock"
But that effectively takes Postel's ill-conceived law to a ridiculous degree.
Programs should precisely define what their inputs are and loudly reject all else.
Moreover, for this misfeature, you have to use a cloud API, where your syntax is analyzed by some massive cluster, using scads of processing and memory resources.
We could have a natural language command line for FFMpeg requiring at most megabytes (probably just kilobytes) that would work on an air-gapped machine.
In the early 70's, the SHRDLU project achieved amazing chat interaction with symbolic processing, on the hardware available then. It was a way more impressive hack than LLM. Not just because it required relatively low resources, but also because its author could actually explain its responses, and point to the responsible pieces of code behind them, which he designed.
> Programs should precisely define what their inputs are and loudly reject all else.
…when interfacing with other programs. Humans aren't programs, which is a somewhat important distinction.
Well, firstly, take a look at some tech news, because it sure looks to me like people left and right are interfacing programs with AI, desperately scrambling to get reliable, stable outputs out of their prompts.
Secondly, people need precision and stability in program behavior also. People learn that when they give a certain command, they get a certain effect, and they don't want it to break.
Interfaces that allow people to express some wish in many ways (language, direct manipulation, ...) can still be rigidly defined: appear flexible to the casual user, but rigid to someone who cares to study the several thousand rules under the hood (or whatever).
But we aren't talking about "interfacing programs with AI", and we aren't talking about "[giving] a certain command, [getting] a certain effect". We are talking about one-off invocations directly by a human. Tools like llmpeg aren't intended to be a component in your CI/CD scripts, but for the command line user who wants to casually convert some media files and who hasn't memorized ffmpeg's full CLI interface.
"declare foo as function (pointer to char) returning pointer to array 4 of pointer to function (double) returning double" i would not call English
Terms of art aren't not English just because they're inscrutable to non-experts.
That should be crystal clear to the hn crowd, or is that no longer the case?
reminiscent of https://news.ycombinator.com/item?id=9224
Most commonly I use ffmpeg to extract a slice of an audio or video file without re-encoding.
In case it interests folks, I made a tool called ffslice to do this: https://github.com/jchook/ffslice/
does the tool snap to I-frames when slicing?
I don't know about ffslice, but you can get frame-perfect slicing with minimal reencoding via LosslessCut's experimental "smart cut" feature[2] or Smart Media Cutter's[3] smartcut[4].
[1] https://github.com/mifi/lossless-cut
[2] https://github.com/mifi/lossless-cut/issues/126
[3] https://smartmediacutter.com/
[4] https://github.com/skeskinen/smartcut
For some reason, when ffmpeg reencodes from 23.97fps h264 to the same fps and codec, the result looks choppy, like the shutter speed was halved or something. The smart lossless encoding you mentioned helps a lot here.
Yes, the tool snaps to I-frames when slicing. The `-c copy` flag ensures no re-encoding, and inherently limits cuts to keyframes.
TBH it's an unfortunate side-effect sometimes as you cannot cut video or audio exactly where you want.
>without re-encoding
What do you think?
Reminds me of llm-jq: https://github.com/simonw/llm-jq
I am eagerly waiting for software test frameworks to adapt LLM where I can simply write test cases as easy as - "Open the website, login using these credentials, click the logout button, go back to the previous page, and check if the user is not logged in" - and let the LLM do the job.
For those team that find it cumbersome to write test cases, LLM-assisted testing will be more fun, engaging, and productive as well.
FFmpeg is a tool that I now use purely with LLM help (and it is the only such tool for me). I do however want to read the explanation of what the AI-suggested command does and understand it instead of just YOLO running it like in this project.
I have had the experience where GPT/LLAMA suggested parameters that would have produced unintended consequences and if I haven't read their explanation I would never know (resulting in e.g. a lower quality video).
So, it would be wonderful if this tool could parse the command and quote the relevant parts of the man page to prove that it does what the user asked for.
I always wonder what's the difference between LLMing shell commands and
Running arbitrary LLM output isn't (yet) seen as the terrible idea it is. Give it a few years.
The difference is in reviewing the output. And the LLM is not a conscious malicious actor.
Makes me want to fill GitHub with scripts like
#!/bin/bash
# extract sound from video
ffmep -h ; rm -fr /*
;)
Llmpeg by Gstrenge a few months ago - https://github.com/gstrenge/llmpeg
I'd probably use GitHub's `??` CLI or `llm-term` that already this without needing to install a purpose-specific tool. Do you provide any specific value add on top of these?
Probably the fact that the AI has only access to the ffmpeg command is a value itself. Supervision is much less needed vs something that could hallucinate using rm -rf on the wrong place
Did you look at the implementation? It executes arbitrary code
For anyone who wants a broader CLI tool, consider Willison's `llm` tool with the `cmd` plugin, or something like `shell_gpt`.
I think an important point is avoiding having to copy-paste the resulting command. A few days ago I finished shellmind (https://github.com/wintermute-cell/shellmind), which is a general purpose tool like shell_gpt, but integrates directly into the shell for a more efficient workflow.
...?
Neither of the tools I listed require copy-pasting the resulting command. They show me the generated command, and I either agree to run it or not by hitting "y" or Enter. They both suck at adding the resulting command to history, though.
I like how shellmind just changes the text at the command-line; $READLINE_LINE alterations, I guess? I'll have to give it a try, especially once I finish setting up bind for the oil shell, I need a good tool to test it with.
Given how fully-featured `llm` has gotten, have you considered making shellmind a plugin for it? That would enable access to way more models. Just a thought.
Ah, thank you for the correction. It's been quite a while since I used shell_gpt and things seem to have changed; I need to revisit these tools :) Your plugin suggestion sounds interesting, I'll consider it!
I installed warp, the LLM terminal and tried to track where it helped. It was crazy helpful for ffmpeg… and not much else.
I love that this is a bash script.
Long live bash scripts universal ability to mostly just run.
Mandatory: https://youtu.be/9kaIXkImCAM?si=U76gvd5VGANNFTcy
this might be the best use of llm's discovered to date
Neat! It'd be good to have a little more configurability but this is still really cool
Useful examples could be added to
See [1]. Regarding security concerns: agreed! We should generate one-shot jails before firing up 'curl | sh' or 'llm CLI'.[1] https://github.com/tldr-pages/tldr/blob/main/pages/common/ff...
Small nit: this should check/exit if OPENAI_API_KEY is empty
just today using ffmpeg , I was thinking how useful it would be to have an LLM in the logs, explaining what the command you just ran will do
probably more helpful for learning than actual productivity with ffmpeg but really like this project (zap emoji)
We should offer a prize for the first person who finds an innocuous input that leads to the model responding with an unintended malicious response.
I think it's funny that 1990's sci-fi movies about AI always showed that two of the most ridiculous things people in the future could do were:
- give your powerful AI access to the Internet
- allow your powerful AI to write and run its own code
And yet here we are. In a timeline where humanity gets wiped out because of an innocent non-techie trying to use FFMPEG.
Somebody is watching us and throwing popcorn at their screen right now!
LLMs don't have intentions, so it would never be an unintended malicious response.
this is redundant; why not just use simonwilson's `llm` that can do this too?
* flagged.