I'm... not sure that's the kind of Friday-at-2AM embarrassing mistakes you want to put on your blog if you sell yourself as an elite consultant?
Not using `-iname`, using `-print0` and being surprised to see NULs appearing, the weird pipe + xargs instead of just `-exec`, using some hyper-convoluted way of replacing the NULs instead of just man find... that's probably not the best advertisement for “decades of consulting experience ”.
I think that part makes sense though as it's simply the older idiom before -exec existed. (That's the reason why both find and xargs have specific flags related to /0-delimited filenames that are basically counterparts to each other)
Also, shouldn't the two be roughly equal in efficiency? To my knowledge, xargs (without -i) does the same command aggregation that -exec ... + does.
So the number of "grep" processes spawned by the two commands should be roughly the same, I think.
Was puzzled about that too, especially since his solution "find ... -print0 | strings" undoes the advantage that -print0 gives you, i.e. safe handling of filenames with newlines in them (and his "sed" solution straight-up undoes the -print0 completely).
So with all due respect to the author, I wonder if he was just using -print0 after rote-learning it as part of the find command (or having had some tutor implore "ALWAYS use -print0"), without knowing what it does.
No it won’t, because none of the output is interpreted as an argument. It’s passed as lines to grep. The second invocation correctly uses print0 and pairs with xargs to understand this.
Now, it does fail with filenames that have newlines in them, but who would do such a thing!
Certainly, which is why I put quotes around right, but for this usage, it's not an issue. Find prints the whole path on a single line (including the spaces) and grep (by default) puts the full matched line, so you'll still get the full file path regardless of how many spaces are in it.
That solves only the second part of their task. The part which they actually had no problem with. But I agree the exec + solution feels better then the xargs -0 solution.
Agreed. The first part of that task just seemed to be a misunderstanding of what -print0 is, and using `strings` as the fix is weird. I'm surprised they didn't suggest `tr '\0' '\n'`... :-)
It's compatible or close enough with more modern regex syntaxes. Which are probably familiar to a lot more people than grep. Want to search for square brackets, then escape them (or do a a string literal search with -F)
The first line they start with is utter nonsense. find -print0 will not produce lines, but records (or strings) separated by NUL. But grep is a tool working with lines (separated by LF). No mystery that it cannot work.
Using -print0 is necessary if you have filenames containing LF chars. Otherwise just use -print and grep and everything should be fine.
Now how do we handle NUL separated records? That required a bit of thinking, the Unix world is based so much on lines. Without extensive testing the following awk program seems to work:
BEGIN { RS = "\0" }
$0 ~ regexp
Call with
awk -v 'regexp=what I search for'
In their script that would be
awk -v "regexp=$1"
Edit: Credits for s/whitespace/LF chars/ go to user hnfong
When grepping for filenames print0 is needed only when the files have new lines in them. (Which is quite degenerate.) grep works fine with spaces and tabs in the stdin
Instead of `find -name '*.py' | grep -i "$PATTERN"` you can use `find -iname "*${PATTERN}*.py"` for case-insensitive glob-matched filenames, or mess around with regexes on the whole path with `find -iregex "$REGEX"`.
And yeah, why would you ASCII NUL terminate each filename output by `find` by using `-print0`? I mean, who adds quotes, backslashes or whitespace to their Python source file names?
I'm... not sure that's the kind of Friday-at-2AM embarrassing mistakes you want to put on your blog if you sell yourself as an elite consultant?
Not using `-iname`, using `-print0` and being surprised to see NULs appearing, the weird pipe + xargs instead of just `-exec`, using some hyper-convoluted way of replacing the NULs instead of just man find... that's probably not the best advertisement for “decades of consulting experience ”.
> the weird pipe + xargs instead of just `-exec
I think that part makes sense though as it's simply the older idiom before -exec existed. (That's the reason why both find and xargs have specific flags related to /0-delimited filenames that are basically counterparts to each other)
Also, shouldn't the two be roughly equal in efficiency? To my knowledge, xargs (without -i) does the same command aggregation that -exec ... + does.
So the number of "grep" processes spawned by the two commands should be roughly the same, I think.
To be fair, he is a mathematical consultant who uses computer tools, not a specialist in computer tools.
I thought exactly the same. Those who can do, do. Those who cannot do, teach. Those who cannot teach, consult.
But instead of dwelling on prejudices I decided to try my own solution. See https://news.ycombinator.com/item?id=42163286
Why insult teachers?
You're using -print0 and surprised that it's output has NUL characters between them?
Was puzzled about that too, especially since his solution "find ... -print0 | strings" undoes the advantage that -print0 gives you, i.e. safe handling of filenames with newlines in them (and his "sed" solution straight-up undoes the -print0 completely).
So with all due respect to the author, I wonder if he was just using -print0 after rote-learning it as part of the find command (or having had some tutor implore "ALWAYS use -print0"), without knowing what it does.
> There may be better solutions [1], but my solution was to insert a call to strings in the pipeline
The "right" answer is to switch to using -print rather than -print0
-print delimits the values with a newline character (\n) -print0 delimits the values with a null character (\0)
Not always perfectly right because an argument containing a filename containing a space character will be interpreted as 2 arguments.
No it won’t, because none of the output is interpreted as an argument. It’s passed as lines to grep. The second invocation correctly uses print0 and pairs with xargs to understand this.
Now, it does fail with filenames that have newlines in them, but who would do such a thing!
Certainly, which is why I put quotes around right, but for this usage, it's not an issue. Find prints the whole path on a single line (including the spaces) and grep (by default) puts the full matched line, so you'll still get the full file path regardless of how many spaces are in it.
Am I the only one who has gone all in on using "-exec +"?
I've switched away from find entirely, and now use "fd" whose exec functionality is quite straightforward to use.
That solves only the second part of their task. The part which they actually had no problem with. But I agree the exec + solution feels better then the xargs -0 solution.
Agreed. The first part of that task just seemed to be a misunderstanding of what -print0 is, and using `strings` as the fix is weird. I'm surprised they didn't suggest `tr '\0' '\n'`... :-)
I recommend giving ripgrep a try. (it's been around awhile now) https://github.com/BurntSushi/ripgrep
It's not compatible with grep though. How do you search for a square bracket?
And why does it search the current directory when its input is redirected from /dev/null? What other surprises are there?It's compatible or close enough with more modern regex syntaxes. Which are probably familiar to a lot more people than grep. Want to search for square brackets, then escape them (or do a a string literal search with -F)
https://manpages.debian.org/testing/ripgrep/rg.1.en.html
So much faster than grep for these things! Love ripgrep! I also use it to rip apart directories of log files. Super convenient
The first line they start with is utter nonsense. find -print0 will not produce lines, but records (or strings) separated by NUL. But grep is a tool working with lines (separated by LF). No mystery that it cannot work.
Using -print0 is necessary if you have filenames containing LF chars. Otherwise just use -print and grep and everything should be fine.
Now how do we handle NUL separated records? That required a bit of thinking, the Unix world is based so much on lines. Without extensive testing the following awk program seems to work:
Call with In their script that would be Edit: Credits for s/whitespace/LF chars/ go to user hnfongWhen grepping for filenames print0 is needed only when the files have new lines in them. (Which is quite degenerate.) grep works fine with spaces and tabs in the stdin
Thanks! Updated.
You keep using that -print0, I do not think it means what you think it means
Pretty convoluted, no?
I would likely use -exec:
Edit: Ah, right, he's filtering on filenames. That's what -iname is for. The man file is quite good.Instead of `find -name '*.py' | grep -i "$PATTERN"` you can use `find -iname "*${PATTERN}*.py"` for case-insensitive glob-matched filenames, or mess around with regexes on the whole path with `find -iregex "$REGEX"`.
And yeah, why would you ASCII NUL terminate each filename output by `find` by using `-print0`? I mean, who adds quotes, backslashes or whitespace to their Python source file names?
Why not just globstar in the first place? grep foo **/*ham*py