How to convert to lowercase and uppercase in bash, awk, sed, and tr, and the benchmark results for each

This article introduces how to convert to lowercase and uppercase in bash, awk, sed, and tr, and the benchmark results for each.

Versions

The versions are as follows:

  • OS
    • Ubuntu 20.04.1 LTS
  • bash
    • GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
  • gawk
    • GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
  • mawk
    • 1.3.4 20200120
  • sed
    • sed (GNU sed) 4.7
  • tr
    • tr (GNU coreutils) 8.30

Convert to lowercase/uppercase

bash

The documentation says the following:

The ‘^’ operator converts lowercase letters matching pattern to uppercase; the ‘,’ operator converts matching uppercase letters to lowercase. The ‘^^’ and ‘,,’ expansions convert each matched character in the expanded value;

# to lower $ val="AAA" $ echo ${val,,} aaa # to upper $ val="aaa" $ echo ${val^^} AAA
Code language: PHP (php)

awk

The documentations says that there are functions called tolower and toupper.

# to lower $ echo AAA | awk '{print tolower($0)}' aaa # to upper $ echo aaa | awk '{print toupper($0)}' AAA
Code language: PHP (php)

sed

The documentation says the following:

\L
Turn the replacement to lowercase until a \U or \E is found,

\U
Turn the replacement to uppercase until a \L or \E is found,

# to lower $ echo AAA | sed -e 's/\(.*\)/\L\1/' aaa # to upper $ echo aaa | sed -e 's/\(.*\)/\U\1/' AAA
Code language: PHP (php)

However, \L and \U cannot be used in BSD sed.

tr

The documentation says the following:

[:lower:]
all lower case letters

[:upper:]
all upper case letters

# to lower $ echo AAA | tr '[:upper:]' '[:lower:]' aaa # to upper $ echo aaa | tr '[:lower:]' '[:upper:]' AAA
Code language: PHP (php)

The above is how to convert to lowercase and uppercase.

Benchmark

Benchmark the performance of each. Will use hyperfine](https://github.com/sharkdp/hyperfine) for benchmarking.

Generate script

To convert strings of various lengths, write the following script to generate a script that executes commands on arbitrary-length strings.

#!/bin/bash n=${1} str=$(perl -e "print 'a' x ${n}") cat <<EOF > bash-n${n}.sh v=${str}; echo \${v^^} EOF cat <<EOF > tr-n${n}.sh echo ${str} | tr '[:lower:]' '[:upper:]' EOF cat <<EOF > sed-n${n}.sh echo ${str} | sed -e 's/\(.*\)/\U\1/' EOF cat <<EOF > gawk-n${n}.sh echo ${str} | gawk '{print toupper(\$0)}' EOF cat <<EOF > mawk-n${n}.sh echo ${str} | mawk '{print toupper(\$0)}' EOF cat <<EOF > benchmark-n${n}.sh hyperfine --warmup 3 --min-runs 1000 "bash bash-n${n}.sh" "bash tr-n${n}.sh" "bash sed-n${n}.sh" "bash gawk-n${n}.sh" "bash mawk-n${n}.sh" --export-markdown benchmark-result-n${1}.md EOF
Code language: PHP (php)

If the above script called gen-scripts.sh, run it as gen-scripts.sh 10 to generate a script that executes commands against aaaaaaaaa(length = 10) and benchmarks them with hyperfine. Since it is unlikely that the results will change between lowercase and uppercase conversions, benchmark only for uppercase conversions.

Results

Benchmarked the string lengths as 10, 100, 1,000, 10,000, 100,000, and 1,000,000. *-n10.sh has a string length of 10, *-n100.sh has a string length of 100, … and so on.
The benchmark was done on Ubuntu 20.04.1, but since the first installed awk was mawk, so also measured it with gawk.

CommandMean [ms]Min [ms]Max [ms]Relative
bash bash-n10.sh2.2 ± 0.51.56.31.00
bash tr-n10.sh3.1 ± 0.62.49.61.43 ± 0.44
bash sed-n10.sh3.3 ± 0.52.76.71.55 ± 0.44
bash gawk-n10.sh3.9 ± 0.73.18.51.81 ± 0.54
bash mawk-n10.sh3.3 ± 0.62.56.71.52 ± 0.46
CommandMean [ms]Min [ms]Max [ms]Relative
bash bash-n100.sh1.6 ± 0.31.13.31.00
bash tr-n100.sh2.6 ± 0.51.86.11.70 ± 0.45
bash sed-n100.sh2.8 ± 0.52.25.61.79 ± 0.46
bash gawk-n100.sh3.4 ± 0.62.66.82.16 ± 0.55
bash mawk-n100.sh2.7 ± 0.52.06.41.70 ± 0.45
CommandMean [ms]Min [ms]Max [ms]Relative
bash bash-n1000.sh2.3 ± 0.51.67.21.00
bash tr-n1000.sh3.2 ± 1.32.419.01.44 ± 0.66
bash sed-n1000.sh4.6 ± 2.22.824.12.02 ± 1.08
bash gawk-n1000.sh4.1 ± 0.93.18.91.83 ± 0.57
bash mawk-n1000.sh3.2 ± 0.42.66.31.40 ± 0.37
CommandMean [ms]Min [ms]Max [ms]Relative
bash bash-n10000.sh3.1 ± 0.42.47.31.00
bash tr-n10000.sh3.4 ± 0.62.66.11.11 ± 0.25
bash sed-n10000.sh5.7 ± 26.53.8842.71.84 ± 8.62
bash gawk-n10000.sh4.4 ± 0.73.47.61.44 ± 0.31
bash mawk-n10000.sh3.4 ± 0.42.75.51.10 ± 0.21
CommandMean [ms]Min [ms]Max [ms]Relative
bash bash-n100000.sh17.1 ± 1.414.829.41.63 ± 0.20
bash tr-n100000.sh10.5 ± 1.08.915.81.00
bash sed-n100000.sh21.2 ± 1.619.039.22.02 ± 0.24
bash gawk-n100000.sh12.8 ± 1.011.019.11.22 ± 0.15
bash mawk-n100000.sh11.9 ± 0.710.718.31.13 ± 0.13
CommandMean [ms]Min [ms]Max [ms]Relative
bash bash-n1000000.sh144.8 ± 11.2126.3206.81.89 ± 0.19
bash tr-n1000000.sh77.4 ± 6.967.1143.51.01 ± 0.11
bash sed-n1000000.sh172.3 ± 13.6151.1276.52.24 ± 0.23
bash gawk-n1000000.sh84.3 ± 5.475.4103.71.10 ± 0.10
bash mawk-n1000000.sh76.8 ± 5.068.8128.31.00

The results are summarized as follow:

  • For string lengths of 1,000 or less, conversion by bash’s variable expansion is the fastest.
  • At a string length of 10,000, bash’s variable expansion, tr, and mawk, become almost as fast.
  • Bash’s variable expansion slows down when the string length 100,000 or higher, and tr, mawk, and gawk are faster.
  • The performance of tr and mawk will always be about the same.
  • sed is the slowest in all results.
  • Perhaps because the process was simple, there was not much difference in performance between gawk and mawk.

Conclusion

This article introduces how to convert to lowercase and uppercase in bash, awk, sed, and tr, and the benchmark results for each.
However, since there is only a 2-fold difference in performance between top and bottom, think there is no problem using the command you like as long as it is not executed too often.
Also, since CPU and memory usage is not measured in hyperfine, choosing which one to use will change when you consider those factors as well.

Leave a Reply