What are the most useful python functions to know?

“Most useful” is obviously subjective and domain dependent, but these are the most useful python functions to know for general purpose python scripting (i.e. small python scripts that do things we used to do via shell scripts).

Working With Files/Directories

Reading From a File

# open a file for reading and in binary mode (returns a "file object")
file_object = open("path/to/file.ext",mode='rb')

# Note: In "mode='rb'", the r means open the file for reading; the b means 
# open the file in binary mode. When you read from a file that was opened in
# binary mode, *bytes* will be returned whereas when you read from a file that
# was opened in text mode, the bytes will be decoded according to some encoding
# and a *string* will be returned. You can explicitly specify an encoding by passing
# in a "encoding='utf-8'" parameter to open(), or if you don't specify one, a platform dependent
# one will be used (on linux, utf-8 will be used).

# open a file for reading and in text mode (i.e. read methods will return strings, not bytes)
file_object = open("path/to/file.ext",mode='rt')

# read a single line from the file and advance the position 
# - returns the line as bytes if the file was opened in binary mode
# - returns the line as a string if the file was opened in text mode
line_as_a_string = file_object.readline()

# read every single line in the file (returns a list of lines - well, a generator of lines to be more specific)
# - again, the lines will either be bytes or strings (depending on what mode the file was opened in)
all_lines_as_a_list = file_object.readlines()

# close the file
file_object.close()

# see if the file has already been closed
file_object.closed

# see if the file is seekable (not all files can be seeked (i.e. sockets, pipes, etc))
file_object.seekable()

# seek 10 bytes from the beggining of the file (throws if the file is not seekdable)
file_object.seek(10,0)

# returns the current position in the file
file.tell()

# get the path of the file (as opened)
file.name

Writing to a File

# open a file for writing and in binary mode
file_object = open("path/to/file.ext",mode="wb") # or mode="a" for append

# note: You can open a file for writing in binary or text mode. If you open it
# in binary, you will write bytes to the file. If you open it in text mode, you
# will write strings to the file (the strings will be encoded in the chosen encoding scheme
# and then written as bytes ultimatley anyways).

# open a file for writing and in text mode
file_object = open("path/to/file.ext",mode="wt")

# write some stuff to the file
file_object.write("some stuff here")

# note: In the above line, I am writing a string to the file, thus 
# I am assuming the file was opened in text mode. If you opened
# the file in binary mode, instead of passing a string to write(),
# pass a bytes object, like so
file_object.write(b'these are bytes') # notice the b prefix!

# write a list of lines to the file
# note: again, if the file was opened in binary mode, the list of
# lines better be a list of bytes objects! If the file was opened
# in text mode, they better be a list of strings! I hope you are seeing a
# pattern here!
file_object.writelines(list_of_lines)

Working With Directories

import os
import glob

# list all the files/directories in a specified directory
# - the directory can be specified as an absolute or relative path (relative to the cwd of course)
os.listdir("/some/path")

# for each folder in some top level directory, get a list of all the folders and files in there (recursively)
for folder, folders. files in os.walk("/some/directory"): # the top directory can be specified as an absolute or a relative path
    # - folders is a list of folders in the folder 'folder'
    # - files is a list of files in the folder 'folder'
    pass     

# get a list of files that match a unix shell glob pattern
glob.glob("*.cpp") # get a list of files that end in .cpp
glob.glob("/home/abdullah/*.c) # get a list of all files in /home/abdullah that end in .c

Working With Paths

import os

# join 2 or more paths (can pass more than 2 args); the paths are joined from left to right
os.path.join(left_path,right_path)

# find out if a path is specified in absolute format (i.e. starts with leading slash in linux or C:/ (or another drive letter) in windows)
os.path.isabs(path)

# find out if a path is a directory
os.path.isdir(path)

# find out if a path is a file
os.path.isfile(path)

# split a path between the final component after the last slash and everything before
# - if you use this on a file path, you are getting the directory and the actual file name
# - if you use this on a directory path, you are getting the parent directory and the actual directory name
os.path.split(path)

# get the basename of the path (i.e. the final component of the path)
os.path.basename(path)

# get the directory name of the path (i.e. everything before the final component)
os.path.dirname(path)

Working With Text

# see if a string starts with something specific
some_string.startswith("sub_string")

# see if a string starts with any of specified stuff
some_string.startswith(("string1","string2"))

# see if a string ends in something specific
some_string.endswith("some_string")

# see the place that another specific string resides in a string
some_string.find("sub_string") # returns the index that "sub_string" occurs in some_string

Working With Regular Expressions

import re

# search a string for a particular regex pattern
# - returns a match object telling you where (start/stop index) the pattern occured in the string, or None if the pattern did not occur in the string
re.search(pattern,string)

# usually you use it like this:
if re.search(pattern,string):
    # do something
    pass

# find all places the pattern matches in a string
for match in re.finditr(pattern,string):
    pass # match.start() tells you where this match starts, match.end() tells you where this match stops

Working With Processes

import subprocess

# create a process from an executable file, and pass it arguments
# - the subprocess will inherit your stdin/stdout/stderr
process_object = subprocess.Popen(["ls","-l")
assert process_object.wait() is 0 # wait() waits until the process exits, then returns its exit code

# you can specify the stdin, stdout, and stderr that the process should use
# - here we redirect stderr to stdout, and use a pipe for stdout
process_object = subprocess.Popen(["ls","-l",stderr=subprocess.STDOUT,stdout=subprocess.PIPE]) 

# you can redirect stderr and/or stdout to the null device
# - here we redirect stderr to stdout, then stdout to the null device
process_object = subprocess.Popen(["ls","-l",stderr=subprocess.STDOUT,stdout=subprocess.DEVNULL]) 

# if you redirect stdout to a pipe, you can read from that pipe (just treat it as a file object)
process_object.stdout.readline() # read a line of the stdout pipe
process_object.stdout.readlines() # read all lines of the stdout pipe

subprocess.run()

subprocess.run() is the most commonly used (and most convenient) way to run a sub process.

completed_process_object = subprocess.run(["ls -l"],stdout=subprocess.PIPE,check=True,shell=True)

# launches the process, waits for it to complete, then returns information about the completed process
# - by default, output of the process isn't captured, to capture it, pass stdout=subprocess.PIPE argument
# - check=True will ensure the process returned 0, otherwise an exception will be raised
# - shell=True will make a shell process run the command

completed_process_object.returncode # check the return code of the completed process
completed_process_object.stdout # 'bytes' object containing the output of the completed process

I will most likely edit this document in the future (as my idea of “most useful” changes).