Basics of Searching for Software Engineers
Introduction
Searching (for files/directories that have a certain name/path or files that have something specific in their content) is an extremely common thing in software engineering/programming. You need to get very good and fast at it.
Two common things you want to search for:
- Get all files/directories that have a certain pattern in their path (the last component of which is the name).
- Check the content of one or more files to see if a certain pattern occurs in the content itself (not the path).
If you are on linux, locate and find are very good for getting all files/directories that have a certain pattern in their path. grep is used for searching the content of file(s) for a certain pattern.
Searching File Names/Paths
locate and find are similar in the sense that they both search file/directory paths, but they differ in a couple ways:
locateis way faster, because it searches an indexed database. One negative of this is, if you have created or deleted files/directories since the last time you indexed, you will get out of date results. You use thesudo updatedbcommand to update your index. Some people make it so thatsudo updatedbis automatically called periodically (via a cron job).findlets you get all files/directories that have a certain author, certain size, and a lot more ways.locatesimply searches the file/directory path.findis also a lot slower, since it searches the actual file system, not an index, but this means that the results are always accurate!
Locate
Notes on locate:
- get all files/directories that have a certain pattern in their full path
- typical usage is like this:
locate <something>, which will spit out any files that have<something>in their full path - general syntax is
locate [--regex] [-i] [-b] [--all] <pattern1> <pattern2> <pattern3>- the patterns are by default shell glob
- by default, if you include no globbing characters in a pattern, it is surrounded by
*, i.e.*<pattern>*
- by default, if you include no globbing characters in a pattern, it is surrounded by
- pass
--regexoption to make patterns a regular expression instead of a shell glob - pass
-ito make them case insensitive - pass
-bto only search the “basename” (the last part of the path), not the full path - pass
--allto only show results that match all patterns (by default, results that match any pattern are shown)
- the patterns are by default shell glob
Find
Notes on find:
- get all files that have a certain pattern in their name or path, created by a certain author, has certain size, etc
- typical usage is
find [root_dir] -name <pattern>- root_dir is by default cwd
- use
-path <pattern>instead of-name <pattern>to search the file path (relative to your cwd) and not just the file name - use
-regex <pattern>instead of-name <patter>to 1) search the full path and 2) make it a regex search (not a glob search)
- will actually search the file system (not an index) thus is always accurate, but slower than
locate - searches the root dir recursively
- pass
-type fto only show file results (not directory results) - pass
-type dto only show directory results - pass
-maxdepth 1to only recurse 1 level down (or 2 to go two levels down, etc) - pass
-executableto only show files/directories that have the executable bit set (important note: this does not necessarily mean they are executable files, just that they have the executable bit set!)- if you pass both
-type fand-executablethen you only want to see files (not directories) that have the executable bit set
- if you pass both
- pass
-inameinstead of-nameto search by file name, but ignore casing - similarly, you can use
-ipathand-iregexto ignore casing as well - notice that the options come after the argument!
Here is a nice way to represent all the various options of find
find [dir:cwd by default] -name '*pattern*' [-type d] [-executable] -maxdepth 2 [-print0]
-iname -type f -maxdepth 3
-path -etc
-ipath
-regex
-iregex
Searching File Content (grep)
Notes on grep:
- search file content
- typical usage is
egrep 'for something' <file>- search for lines containing
for something(any regular expression) in the specified file - we use
egrepbecause regular grep does some weird stuff with regular expression searches
- search for lines containing
- another typical usage is
egrep -r 'for something' <dir>- recursively search all files in
<dir>for lines matchingfor something - pass
--include='*.cpp'to only search files that have.cppextension - pass
--include='*.{cpp,c}'to only search files that have either a.cppor.cextension.
- recursively search all files in
- pass
-iif you want case insensitive search - pass
-C some_numberif you want the output to include some context lines (i.e. surrounding lines), you often want to do this so you have an idea of where the results are relative to other stuff in the files
Searching The Content Of Certain Files
Often, you want to search file content, but only for certain files. You can combine find, grep and xargs to achieve this task.
Example: find -name '.cpp' | xargs grep 'cout'.
What we’re saying is, first, give me a list of all .cpp files, then grep each file for the string cout.
Alternatively, we could have done grep -r --include='*.cpp' 'cout' <dir>, which would achieve the same exact thing.
One note: When combining find and grep to only search certain files, if find spits out files with spaces in their name, you have a problem, because xargs seperates it’s arguments based on whitespace. To make this work we need 2 changes.
- Make
findseperate files by null character instead of newline, you can do this by passing-print0option - Make
xargsexpect it’s arguments be seperated by null char instead of newline, you can do this by the-0option
So something like:
find -name '*pattern*' -print0 | xargs -0 grep "what to grep for"
Another note: xargs will run the command once even if no arguments are given, if you don’t want this behavior, use xarg’s -r flag, like so <command> | xargs -r ...
Other Searching Software
File Manager Search
Don’t forget about the GUI search that most file managers offer.
- on certain file managers of linux, you can open the file manager and simply start typing to search the directory recursively
- on windows, you can open file explorer and use the search box on the top right
IDE Search
Additionally most IDEs (and code editors) offer very good searching facilities. While the operating system level search facilities operate on a textual basis, IDE search facilities can actually understand the code. I.e. you can search for functions that have a certain pattern. You can search for classes that have a certain pattern, etc.
- most IDEs offer a facility for searching for
- functions/methods that have a certain text or regular expression in them
- classes that have a certain text or regular expression in them
- class fields that have a certain text or regular expression in them
- global variables that have a certain text or regular expression in them
- files (in the project/solution) that have a certain text or regular expression in their file name
- most IDEs have a facility to search for all of the above in one search box, which is often usefull
- most IDEs offer a facility to search for any symbol (function, method, field, global variable, etc) in the current open/active file; I use this one quite often!
The End
Searching (both file/directory name and file content) is an extremely important skill for a software engineer. You need to be very good at finding things using these various tools, and you need to be fast at it. Make sure you spend the time to understand your search tools, and practice them to ensure they become second nature to you.
I also suggest that you make yourself a little “searching summary” reference sheet (like this article!) and keep it handy somewhere.