Performance & efficiency

Before we begin, I would like to clarify that the power of shell scripting stems from the Unix-native packages. In this post, I will focus on what I believe to be the most crucial aspect of any programming language, which is efficiency.

One should do as little as possible in shell script and aim just to use it to connect the existing logic available in the rich set of utilities available on a UNIX system. !

It is worth noting that even ChatGpt, while powerful in its own right, is not trained to write efficient code due to the limitations of the training data. Therefore, there is a risk of producing suboptimal code.

To illustrate why efficiency is critical, let us consider a straightforward shell function that counts the number of lines in a file

  • Inefficient Approach:
count_lines_in_file() {
    lines=$(cat "$1" | wc -l)
    echo "Number of lines: $lines"
}

In this approach, the cat command is used to read the entire file and then pipe it to the wc command to count the number of lines. This involves unnecessary I/O operations and can be improved.

  • Efficient Approach:
count_lines_in_file() {
    lines=$(wc -l "$1")
    echo "Number of lines: $lines"
}

Seems very simple right? Yet this is the core idea of writing efficient scripts.

cat FILE.txt | grep DO_Something | grep SomethingElse | sed 's/SomethingElse/Replacement/'

You can achieve the same result more efficiently using a single grep or sed command. By avoiding unnecessary tool usage and combining commands efficiently, we simplify the script and improve its performance.

  • cut is a way faster than awk so if you really don’t need it don’t use it !

Stop using cat if you don’t need it !

# Bad practise
cat file.txt | cut -d' ' -f1
cat file.txt | grep "Search For Something"
 
# Good practise
cut -d' ' -f1 file.txt
grep "Search For Something" file.txt
  • Same is true for all other packages 'tr, grep, find, sed etc ...'

Use Streams

  • Use streams instead of writing to files can be more efficient and can help avoid unnecessary disk I/O operations. When you write to a file, the data has to be written to disk, which can slow down your script if you are writing a lot of data.
  • Use (variables, arrays etc) instead of storing data to a file
command1 | command2

This sends the output of “command1” to “command2” without having to write it to a file first. This can be especially useful when dealing with large amounts of data or when working with sensitive information that you don’t want to save to disk.

  • When you use a temporary file make sure you cleanup !
someFonction{
 # Doing something
 
  trap cleanup INT QUIT TERM EXIT
    cleanup(){
        # remove the temporary when something goes wrong (or when the script finishes)
        [ -f $tmpfile ] && rm $tmpfile 
    }
}

Stop using sed for simple stuff

Use ${a// /_} to replace spaces in variable names with underscore instead of

# Bad Practise
sed 's/ /_/g' VAR

Best Practices for File Naming

Use “./*.pdf” instead of “*.pdf”

To improve security, it is recommended to use the file path prefix of ./ when specifying PDF files in a command. Instead of using just *.pdf, which would match any PDF file in the current directory and possibly in subdirectories, use the more specific pattern ././*.pdf.

Using ././*.pdf ensures that the command only operates on PDF files in the current directory and not in subdirectories, which could potentially contain files that are not intended to be operated on. This is an important security measure to prevent accidental or malicious actions on files outside of the current directory.

Why use sh over bash?

While bash is a more powerful shell language than sh, it also has more complexity and features that can make scripts more difficult to read and maintain.

Here are some reasons why you might choose sh over bash:

  • Portability: sh is more widely available on different Unix-like systems than bash, which may not be installed by default on some systems. This means that scripts written in sh are more likely to work on different systems without modifications.

  • Efficiency: sh is a simpler and more lightweight language than bash, which can make scripts run faster and use less system resources.

  • Simplicity: sh has a simpler syntax and fewer features than bash, which can make scripts easier to read and maintain.

Use set -e to exit on errors

Add set -e at the top of your script to exit immediately if any command returns a non-zero status code. This can help catch errors early and prevent your script from continuing in an invalid state. Use $(command) instead of backticks.

Note: backticks: `someCommand`

Use $(command) instead of backticks to execute commands and capture their output. Backticks can be difficult to read and can cause syntax errors in some cases.

Debugging tricks

set -x			# activate debugging from here
# Some Logic 
set +x			# stop debugging from here

Use printf instead of echo

Use printf instead of echo for more consistent and portable output formatting. printf also supports more advanced formatting options.

Styling and readability

# Use this to conditionally execute a command based on the value of a variable
[ "$var" ] && command1     # If var is empty, command1 will not execute

# Instead of this, which can lead to unexpected behavior if var contains whitespace or special characters
[ ! -z $var ] && something
# Use this to check if the value of a variable is equal to a specific string
[ "$var" = "find" ] && echo found

# Instead of this, which is longer and less readable
if [ "$var" -eq 'find' ]; then
    echo found
fi
# Use this to set a default value for a variable
"${var=value}"

# Instead of this, which is longer and less efficient
[ "$var" ] || var="value"
If you have any insights or suggestions, I would love to hear them 🙂.