Resolving a mysterious problem with find

(johndcook.com)

21 points | by chmaynard 5 days ago ago

27 comments

  • shakow 3 hours ago

    I'm... not sure that's the kind of Friday-at-2AM embarrassing mistakes you want to put on your blog if you sell yourself as an elite consultant?

    Not using `-iname`, using `-print0` and being surprised to see NULs appearing, the weird pipe + xargs instead of just `-exec`, using some hyper-convoluted way of replacing the NULs instead of just man find... that's probably not the best advertisement for “decades of consulting experience ”.

    • xg15 21 minutes ago

      > the weird pipe + xargs instead of just `-exec

      I think that part makes sense though as it's simply the older idiom before -exec existed. (That's the reason why both find and xargs have specific flags related to /0-delimited filenames that are basically counterparts to each other)

      Also, shouldn't the two be roughly equal in efficiency? To my knowledge, xargs (without -i) does the same command aggregation that -exec ... + does.

      So the number of "grep" processes spawned by the two commands should be roughly the same, I think.

    • EdwardCoffin 25 minutes ago

      To be fair, he is a mathematical consultant who uses computer tools, not a specialist in computer tools.

    • usr1106 2 hours ago

      I thought exactly the same. Those who can do, do. Those who cannot do, teach. Those who cannot teach, consult.

      But instead of dwelling on prejudices I decided to try my own solution. See https://news.ycombinator.com/item?id=42163286

  • NikkiA 3 days ago

    You're using -print0 and surprised that it's output has NUL characters between them?

    • xg15 3 hours ago

      Was puzzled about that too, especially since his solution "find ... -print0 | strings" undoes the advantage that -print0 gives you, i.e. safe handling of filenames with newlines in them (and his "sed" solution straight-up undoes the -print0 completely).

      So with all due respect to the author, I wonder if he was just using -print0 after rote-learning it as part of the find command (or having had some tutor implore "ALWAYS use -print0"), without knowing what it does.

  • unsnap_biceps 9 hours ago

    > There may be better solutions [1], but my solution was to insert a call to strings in the pipeline

    The "right" answer is to switch to using -print rather than -print0

    -print delimits the values with a newline character (\n) -print0 delimits the values with a null character (\0)

    • natmaka 8 hours ago

      Not always perfectly right because an argument containing a filename containing a space character will be interpreted as 2 arguments.

      • CGamesPlay 5 hours ago

        No it won’t, because none of the output is interpreted as an argument. It’s passed as lines to grep. The second invocation correctly uses print0 and pairs with xargs to understand this.

        Now, it does fail with filenames that have newlines in them, but who would do such a thing!

      • unsnap_biceps 8 hours ago

        Certainly, which is why I put quotes around right, but for this usage, it's not an issue. Find prints the whole path on a single line (including the spaces) and grep (by default) puts the full matched line, so you'll still get the full file path regardless of how many spaces are in it.

  • linsomniac 3 hours ago

    Am I the only one who has gone all in on using "-exec +"?

        find . -name '*.py' -type f -exec grep -il {} +
    • nicoburns an hour ago

      I've switched away from find entirely, and now use "fd" whose exec functionality is quite straightforward to use.

    • usr1106 3 hours ago

      That solves only the second part of their task. The part which they actually had no problem with. But I agree the exec + solution feels better then the xargs -0 solution.

      • linsomniac 2 hours ago

        Agreed. The first part of that task just seemed to be a misunderstanding of what -print0 is, and using `strings` as the fix is weird. I'm surprised they didn't suggest `tr '\0' '\n'`... :-)

  • TheGrassyKnoll 4 hours ago

    I recommend giving ripgrep a try. (it's been around awhile now) https://github.com/BurntSushi/ripgrep

    • oguz-ismail 2 hours ago

      It's not compatible with grep though. How do you search for a square bracket?

          $ grep '[][]' </dev/null
          $ rg '[][]' </dev/null
          rg: regex parse error:
              (?:[][])
                   ^^
          error: unclosed character class
          $
      
      And why does it search the current directory when its input is redirected from /dev/null? What other surprises are there?
    • pletnes 3 hours ago

      So much faster than grep for these things! Love ripgrep! I also use it to rip apart directories of log files. Super convenient

  • usr1106 2 hours ago

    The first line they start with is utter nonsense. find -print0 will not produce lines, but records (or strings) separated by NUL. But grep is a tool working with lines (separated by LF). No mystery that it cannot work.

    Using -print0 is necessary if you have filenames containing LF chars. Otherwise just use -print and grep and everything should be fine.

    Now how do we handle NUL separated records? That required a bit of thinking, the Unix world is based so much on lines. Without extensive testing the following awk program seems to work:

        BEGIN       { RS = "\0" }
        $0 ~ regexp
    
    Call with

        awk -v 'regexp=what I search for'
    
    In their script that would be

        awk -v "regexp=$1"
    
    
    Edit: Credits for s/whitespace/LF chars/ go to user hnfong
    • hnfong 2 hours ago

      When grepping for filenames print0 is needed only when the files have new lines in them. (Which is quite degenerate.) grep works fine with spaces and tabs in the stdin

      • usr1106 2 hours ago

        Thanks! Updated.

  • creaktive 32 minutes ago

    You keep using that -print0, I do not think it means what you think it means

  • cess11 5 hours ago

    Pretty convoluted, no?

    I would likely use -exec:

       $ mkdir dir.py
       $ echo blah >> blah.py
       $ find . -type f -name "*.py" -exec grep -i BLAH {} \;
       blah
       $
    
    Edit: Ah, right, he's filtering on filenames. That's what -iname is for. The man file is quite good.
  • naruhodo 9 hours ago

    Instead of `find -name '*.py' | grep -i "$PATTERN"` you can use `find -iname "*${PATTERN}*.py"` for case-insensitive glob-matched filenames, or mess around with regexes on the whole path with `find -iregex "$REGEX"`.

    And yeah, why would you ASCII NUL terminate each filename output by `find` by using `-print0`? I mean, who adds quotes, backslashes or whitespace to their Python source file names?

    • pletnes 3 hours ago

      Why not just globstar in the first place? grep foo **/*ham*py