It unsettled me a lot about just how much work was put into making the JavaScript version of this work instead of a purely Python version, due to how OpenCV works. I wonder how universal the laggy OpenCV thing is, because my friend faced it too when working on an OpenCV application. Is it so unavoidable that the only option is to not use Python? I really hope that there is another way of going about this.
Anyways, I am very glad that you put in all that effort to make the JavaScript version work well. Working under limitations is sometimes cool. I remember having to figure out how PyTorch evaluated neural networks, and having to convert the PyTorch neural network into Java code that could evaluate the model without any external libraries (it was very inefficient) for a Java code competition. Although there may have been a better way, what I did was good enough.
Creating a faster python implementation can definitely be done. OpenCV is a thin wrapper over the C++ API so it's not due to some intrinsic python slowness. It is not easy to resolve though and I suspect the way python code is typically written lends itself to an accidentally blocking operation more often than JS code. It's hard to know without seeing the code.
author here, sorry you have to see my janky JavaScript solution XD
but one good thing of going with Tauri is that developing the UI is pretty easy, since it's basically just some web pages, but with access to the system, through the JS <-> Rust communication.
also, rewriting neural network from PyTorch to Java sounds like a big task, I wonder if people are doing ML in Java
Such a cool and inspirational project! Regarding the drift on pinch, have you tried storing the pointer position of the last second and use that as the click position? You could show this position as a second cursor maybe? I've always wondered why Apple doesn't do this for their "eye moves faster than hands" issue as well.
I did a very similar project a few months back. My goal was to help alleviate some of the RSI issues I have, and give myself a different input device.
The precision was always tricky, and while fun, i eventually abandoned the project and switched to face tracking and blinking so i didn't have to hold up my hand.
For some reason the idea of pointing my webcam down, didn't dawn on me ever. I then discovered Project Gameface and just started using that.
Happy programming thank you for the excellent write up and read!
> Python version is super laggy, something to do with OpenCV
Most probably I'm wrong, but I wonder if it has anything to do with all the text being written to stdout. In the odd chance that it happens on the same thread, it might be blocking.
I’m not sure what your reasoning is, but note that blocking I/O including print() releases the GIL. (So your seemingly innocent debugging print can be extremely not harmless under the wrong circumstances.)
Very nice! The sort of thing that I expect to see on HN. Do you currently use it? I mean maybe is not perfect for a mouse replacement but as a remote movie control as shown in one of the last videos is definitely a legit use case. Congrats!
I'm glad it is up to the HN standard :)
No, I don't currently use it, I am back on mouse and touchpad, but I can definitely see what you mean by remote movie control. I would love to control my movie projector with my hand.
I've been thinking on and off on how to improve the forward facing mode. Since having the hand straight ahead of the camera is messing with the readings, I think the MediaPipe is trained on seeing the hand from above or below (and maybe sides) but not straight ahead.
Ideally, the camera should be like kind of above the hand (pointing downwards) to get the best results. But in the current version of downward facing mode, the way to move the cursor is actually by moving the hand around (x and y position of the hand translates to x and y of the cursor). If the camera FOV is very big (capturing from far away), then you would have to move your hand very far in order to move the cursor, which is probably not ideal.
I later found the idea of improvement for this when playing around with a smart TV, where the remote is controlling a cursor. We do that by tilting the remote like up and down or left and right, I think it uses gyroscope or accelerometer (idk which is which). I wish I have a video of it to show it better, but I don't. I think it is possible to apply the same concept here to the hand tracking, so we use the tilt of the hand for controlling the cursor. This way, we don't have to rely on the hand position captured by the camera. Plus, this will work if the camera is far away, since it is only detecting the hand tilt. Still thinking about this.
Anyway, I'm glad you find the article interesting!
If compelling enough I don't mind setting up a downward facing camera. Would like to see some more examples though where it shows some supremacy over just using a mouse. I'm sure there are some scenarios where it is.
It unsettled me a lot about just how much work was put into making the JavaScript version of this work instead of a purely Python version, due to how OpenCV works. I wonder how universal the laggy OpenCV thing is, because my friend faced it too when working on an OpenCV application. Is it so unavoidable that the only option is to not use Python? I really hope that there is another way of going about this.
Anyways, I am very glad that you put in all that effort to make the JavaScript version work well. Working under limitations is sometimes cool. I remember having to figure out how PyTorch evaluated neural networks, and having to convert the PyTorch neural network into Java code that could evaluate the model without any external libraries (it was very inefficient) for a Java code competition. Although there may have been a better way, what I did was good enough.
Creating a faster python implementation can definitely be done. OpenCV is a thin wrapper over the C++ API so it's not due to some intrinsic python slowness. It is not easy to resolve though and I suspect the way python code is typically written lends itself to an accidentally blocking operation more often than JS code. It's hard to know without seeing the code.
author here, sorry you have to see my janky JavaScript solution XD but one good thing of going with Tauri is that developing the UI is pretty easy, since it's basically just some web pages, but with access to the system, through the JS <-> Rust communication.
also, rewriting neural network from PyTorch to Java sounds like a big task, I wonder if people are doing ML in Java
Such a cool and inspirational project! Regarding the drift on pinch, have you tried storing the pointer position of the last second and use that as the click position? You could show this position as a second cursor maybe? I've always wondered why Apple doesn't do this for their "eye moves faster than hands" issue as well.
I did a very similar project a few months back. My goal was to help alleviate some of the RSI issues I have, and give myself a different input device.
The precision was always tricky, and while fun, i eventually abandoned the project and switched to face tracking and blinking so i didn't have to hold up my hand.
For some reason the idea of pointing my webcam down, didn't dawn on me ever. I then discovered Project Gameface and just started using that.
Happy programming thank you for the excellent write up and read!
> Python version is super laggy, something to do with OpenCV
Most probably I'm wrong, but I wonder if it has anything to do with all the text being written to stdout. In the odd chance that it happens on the same thread, it might be blocking.
Could it then be resolved by using the no-gil version of python they just released?
I’m not sure what your reasoning is, but note that blocking I/O including print() releases the GIL. (So your seemingly innocent debugging print can be extremely not harmless under the wrong circumstances.)
Mediapipe is a lot of fun to play with and I'm surprised how little it seems to be used.
You might also be interested in Project Gameface, open source Windows and Android software for face input: https://github.com/google/project-gameface
Also https://github.com/takeyamayuki/NonMouse
Some problems in life can be easily fixed with crimson red nail polish.
Very nice! The sort of thing that I expect to see on HN. Do you currently use it? I mean maybe is not perfect for a mouse replacement but as a remote movie control as shown in one of the last videos is definitely a legit use case. Congrats!
I'm glad it is up to the HN standard :) No, I don't currently use it, I am back on mouse and touchpad, but I can definitely see what you mean by remote movie control. I would love to control my movie projector with my hand.
I've been thinking on and off on how to improve the forward facing mode. Since having the hand straight ahead of the camera is messing with the readings, I think the MediaPipe is trained on seeing the hand from above or below (and maybe sides) but not straight ahead.
Ideally, the camera should be like kind of above the hand (pointing downwards) to get the best results. But in the current version of downward facing mode, the way to move the cursor is actually by moving the hand around (x and y position of the hand translates to x and y of the cursor). If the camera FOV is very big (capturing from far away), then you would have to move your hand very far in order to move the cursor, which is probably not ideal.
I later found the idea of improvement for this when playing around with a smart TV, where the remote is controlling a cursor. We do that by tilting the remote like up and down or left and right, I think it uses gyroscope or accelerometer (idk which is which). I wish I have a video of it to show it better, but I don't. I think it is possible to apply the same concept here to the hand tracking, so we use the tilt of the hand for controlling the cursor. This way, we don't have to rely on the hand position captured by the camera. Plus, this will work if the camera is far away, since it is only detecting the hand tilt. Still thinking about this.
Anyway, I'm glad you find the article interesting!
Man, I feel making diagrams / writing handwritten notes will be great with this!
If compelling enough I don't mind setting up a downward facing camera. Would like to see some more examples though where it shows some supremacy over just using a mouse. I'm sure there are some scenarios where it is.
It's projects like this that really make me want to start on a virtual theremin. Wish I had the time :(
Oh that's an awesome idea!
Very impressive! This opens up a whole new set of usages for this headset