Introduction

Searching (for files/directories that have a certain name/path or files that have something specific in their content) is an extremely common thing in software engineering/programming. You need to get very good and fast at it.

Two common things you want to search for:

  1. Get all files/directories that have a certain pattern in their path (the last component of which is the name).
  2. Check the content of one or more files to see if a certain pattern occurs in the content itself (not the path).

If you are on linux, locate and find are very good for getting all files/directories that have a certain pattern in their path. grep is used for searching the content of file(s) for a certain pattern.

Searching File Names/Paths

locate and find are similar in the sense that they both search file/directory paths, but they differ in a couple ways:

  1. locate is way faster, because it searches an indexed database. One negative of this is, if you have created or deleted files/directories since the last time you indexed, you will get out of date results. You use the sudo updatedb command to update your index. Some people make it so that sudo updatedb is automatically called periodically (via a cron job).
  2. find lets you get all files/directories that have a certain author, certain size, and a lot more ways. locate simply searches the file/directory path. find is also a lot slower, since it searches the actual file system, not an index, but this means that the results are always accurate!

Locate

Notes on locate:

  • get all files/directories that have a certain pattern in their full path
  • typical usage is like this: locate <something>, which will spit out any files that have <something> in their full path
  • general syntax is locate [--regex] [-i] [-b] [--all] <pattern1> <pattern2> <pattern3>
    • the patterns are by default shell glob
      • by default, if you include no globbing characters in a pattern, it is surrounded by *, i.e. *<pattern>*
    • pass --regex option to make patterns a regular expression instead of a shell glob
    • pass -i to make them case insensitive
    • pass -b to only search the “basename” (the last part of the path), not the full path
    • pass --all to only show results that match all patterns (by default, results that match any pattern are shown)

Find

Notes on find:

  • get all files that have a certain pattern in their name or path, created by a certain author, has certain size, etc
  • typical usage is find [root_dir] -name <pattern>
    • root_dir is by default cwd
    • use -path <pattern> instead of -name <pattern> to search the file path (relative to your cwd) and not just the file name
    • use -regex <pattern> instead of -name <patter> to 1) search the full path and 2) make it a regex search (not a glob search)
  • will actually search the file system (not an index) thus is always accurate, but slower than locate
  • searches the root dir recursively
  • pass -type f to only show file results (not directory results)
  • pass -type d to only show directory results
  • pass -maxdepth 1 to only recurse 1 level down (or 2 to go two levels down, etc)
  • pass -executable to only show files/directories that have the executable bit set (important note: this does not necessarily mean they are executable files, just that they have the executable bit set!)
    • if you pass both -type f and -executable then you only want to see files (not directories) that have the executable bit set
  • pass -iname instead of -name to search by file name, but ignore casing
  • similarly, you can use -ipath and -iregex to ignore casing as well
  • notice that the options come after the argument!

Here is a nice way to represent all the various options of find

find [dir:cwd by default] -name '*pattern*' [-type d] [-executable] -maxdepth 2 [-print0]
                          -iname            -type f                 -maxdepth 3
                          -path                                     -etc
                          -ipath
                          -regex
                          -iregex

Searching File Content (grep)

Notes on grep:

  • search file content
  • typical usage is egrep 'for something' <file>
    • search for lines containing for something (any regular expression) in the specified file
    • we use egrep because regular grep does some weird stuff with regular expression searches
  • another typical usage is egrep -r 'for something' <dir>
    • recursively search all files in <dir> for lines matching for something
    • pass --include='*.cpp' to only search files that have .cpp extension
    • pass --include='*.{cpp,c}' to only search files that have either a .cpp or .c extension.
  • pass -i if you want case insensitive search
  • pass -C some_number if you want the output to include some context lines (i.e. surrounding lines), you often want to do this so you have an idea of where the results are relative to other stuff in the files

Searching The Content Of Certain Files

Often, you want to search file content, but only for certain files. You can combine find, grep and xargs to achieve this task.

Example: find -name '.cpp' | xargs grep 'cout'.

What we’re saying is, first, give me a list of all .cpp files, then grep each file for the string cout.

Alternatively, we could have done grep -r --include='*.cpp' 'cout' <dir>, which would achieve the same exact thing.

One note: When combining find and grep to only search certain files, if find spits out files with spaces in their name, you have a problem, because xargs seperates it’s arguments based on whitespace. To make this work we need 2 changes.

  1. Make find seperate files by null character instead of newline, you can do this by passing -print0 option
  2. Make xargs expect it’s arguments be seperated by null char instead of newline, you can do this by the -0 option

So something like:

find -name '*pattern*' -print0 | xargs -0 grep "what to grep for"

Another note: xargs will run the command once even if no arguments are given, if you don’t want this behavior, use xarg’s -r flag, like so <command> | xargs -r ...

Other Searching Software

Don’t forget about the GUI search that most file managers offer.

  • on certain file managers of linux, you can open the file manager and simply start typing to search the directory recursively
  • on windows, you can open file explorer and use the search box on the top right

Additionally most IDEs (and code editors) offer very good searching facilities. While the operating system level search facilities operate on a textual basis, IDE search facilities can actually understand the code. I.e. you can search for functions that have a certain pattern. You can search for classes that have a certain pattern, etc.

  • most IDEs offer a facility for searching for
    • functions/methods that have a certain text or regular expression in them
    • classes that have a certain text or regular expression in them
    • class fields that have a certain text or regular expression in them
    • global variables that have a certain text or regular expression in them
    • files (in the project/solution) that have a certain text or regular expression in their file name
  • most IDEs have a facility to search for all of the above in one search box, which is often usefull
  • most IDEs offer a facility to search for any symbol (function, method, field, global variable, etc) in the current open/active file; I use this one quite often!

The End

Searching (both file/directory name and file content) is an extremely important skill for a software engineer. You need to be very good at finding things using these various tools, and you need to be fast at it. Make sure you spend the time to understand your search tools, and practice them to ensure they become second nature to you.

I also suggest that you make yourself a little “searching summary” reference sheet (like this article!) and keep it handy somewhere.