Basics of Searching for Software Engineers
Introduction
Searching (for files/directories that have a certain name/path or files that have something specific in their content) is an extremely common thing in software engineering/programming. You need to get very good and fast at it.
Two common things you want to search for:
- Get all files/directories that have a certain pattern in their path (the last component of which is the name).
- Check the content of one or more files to see if a certain pattern occurs in the content itself (not the path).
If you are on linux, locate
and find
are very good for getting all files/directories that have a certain pattern in their path. grep
is used for searching the content of file(s) for a certain pattern.
Searching File Names/Paths
locate
and find
are similar in the sense that they both search file/directory paths, but they differ in a couple ways:
locate
is way faster, because it searches an indexed database. One negative of this is, if you have created or deleted files/directories since the last time you indexed, you will get out of date results. You use thesudo updatedb
command to update your index. Some people make it so thatsudo updatedb
is automatically called periodically (via a cron job).find
lets you get all files/directories that have a certain author, certain size, and a lot more ways.locate
simply searches the file/directory path.find
is also a lot slower, since it searches the actual file system, not an index, but this means that the results are always accurate!
Locate
Notes on locate
:
- get all files/directories that have a certain pattern in their full path
- typical usage is like this:
locate <something>
, which will spit out any files that have<something>
in their full path - general syntax is
locate [--regex] [-i] [-b] [--all] <pattern1> <pattern2> <pattern3>
- the patterns are by default shell glob
- by default, if you include no globbing characters in a pattern, it is surrounded by
*
, i.e.*<pattern>*
- by default, if you include no globbing characters in a pattern, it is surrounded by
- pass
--regex
option to make patterns a regular expression instead of a shell glob - pass
-i
to make them case insensitive - pass
-b
to only search the “basename” (the last part of the path), not the full path - pass
--all
to only show results that match all patterns (by default, results that match any pattern are shown)
- the patterns are by default shell glob
Find
Notes on find
:
- get all files that have a certain pattern in their name or path, created by a certain author, has certain size, etc
- typical usage is
find [root_dir] -name <pattern>
- root_dir is by default cwd
- use
-path <pattern>
instead of-name <pattern>
to search the file path (relative to your cwd) and not just the file name - use
-regex <pattern>
instead of-name <patter>
to 1) search the full path and 2) make it a regex search (not a glob search)
- will actually search the file system (not an index) thus is always accurate, but slower than
locate
- searches the root dir recursively
- pass
-type f
to only show file results (not directory results) - pass
-type d
to only show directory results - pass
-maxdepth 1
to only recurse 1 level down (or 2 to go two levels down, etc) - pass
-executable
to only show files/directories that have the executable bit set (important note: this does not necessarily mean they are executable files, just that they have the executable bit set!)- if you pass both
-type f
and-executable
then you only want to see files (not directories) that have the executable bit set
- if you pass both
- pass
-iname
instead of-name
to search by file name, but ignore casing - similarly, you can use
-ipath
and-iregex
to ignore casing as well - notice that the options come after the argument!
Here is a nice way to represent all the various options of find
find [dir:cwd by default] -name '*pattern*' [-type d] [-executable] -maxdepth 2 [-print0]
-iname -type f -maxdepth 3
-path -etc
-ipath
-regex
-iregex
Searching File Content (grep)
Notes on grep
:
- search file content
- typical usage is
egrep 'for something' <file>
- search for lines containing
for something
(any regular expression) in the specified file - we use
egrep
because regular grep does some weird stuff with regular expression searches
- search for lines containing
- another typical usage is
egrep -r 'for something' <dir>
- recursively search all files in
<dir>
for lines matchingfor something
- pass
--include='*.cpp'
to only search files that have.cpp
extension - pass
--include='*.{cpp,c}'
to only search files that have either a.cpp
or.c
extension.
- recursively search all files in
- pass
-i
if you want case insensitive search - pass
-C some_number
if you want the output to include some context lines (i.e. surrounding lines), you often want to do this so you have an idea of where the results are relative to other stuff in the files
Searching The Content Of Certain Files
Often, you want to search file content, but only for certain files. You can combine find
, grep
and xargs
to achieve this task.
Example: find -name '.cpp' | xargs grep 'cout'
.
What we’re saying is, first, give me a list of all .cpp files, then grep each file for the string cout
.
Alternatively, we could have done grep -r --include='*.cpp' 'cout' <dir>
, which would achieve the same exact thing.
One note: When combining find
and grep
to only search certain files, if find spits out files with spaces in their name, you have a problem, because xargs seperates it’s arguments based on whitespace. To make this work we need 2 changes.
- Make
find
seperate files by null character instead of newline, you can do this by passing-print0
option - Make
xargs
expect it’s arguments be seperated by null char instead of newline, you can do this by the-0
option
So something like:
find -name '*pattern*' -print0 | xargs -0 grep "what to grep for"
Another note: xargs
will run the command once even if no arguments are given, if you don’t want this behavior, use xarg’s -r
flag, like so <command> | xargs -r ...
Other Searching Software
File Manager Search
Don’t forget about the GUI search that most file managers offer.
- on certain file managers of linux, you can open the file manager and simply start typing to search the directory recursively
- on windows, you can open file explorer and use the search box on the top right
IDE Search
Additionally most IDEs (and code editors) offer very good searching facilities. While the operating system level search facilities operate on a textual basis, IDE search facilities can actually understand the code. I.e. you can search for functions that have a certain pattern. You can search for classes that have a certain pattern, etc.
- most IDEs offer a facility for searching for
- functions/methods that have a certain text or regular expression in them
- classes that have a certain text or regular expression in them
- class fields that have a certain text or regular expression in them
- global variables that have a certain text or regular expression in them
- files (in the project/solution) that have a certain text or regular expression in their file name
- most IDEs have a facility to search for all of the above in one search box, which is often usefull
- most IDEs offer a facility to search for any symbol (function, method, field, global variable, etc) in the current open/active file; I use this one quite often!
The End
Searching (both file/directory name and file content) is an extremely important skill for a software engineer. You need to be very good at finding things using these various tools, and you need to be fast at it. Make sure you spend the time to understand your search tools, and practice them to ensure they become second nature to you.
I also suggest that you make yourself a little “searching summary” reference sheet (like this article!) and keep it handy somewhere.