How to convert to lowercase and uppercase in bash, awk, sed, and, tr and the benchmark results for each

This article introduces how to convert to lowercase and uppercase in bash, awk, sed, and tr, and the benchmark results for each.

Versions⌗

The versions are as follows:

OS
- Ubuntu 20.04.1 LTS
bash
- GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
gawk
- GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
mawk
- 1.3.4 20200120
sed
- sed (GNU sed) 4.7
tr
- tr (GNU coreutils) 8.30

Convert to lowercase/uppercase⌗

bash⌗

The documentation says the following:

The ‘^’ operator converts lowercase letters matching pattern to uppercase; the ‘,’ operator converts matching uppercase letters to lowercase. The ‘^^’ and ‘,,’ expansions convert each matched character in the expanded value;

# to lower
$ val="AAA"
$ echo ${val,,}
aaa

# to upper
$ val="aaa"
$ echo ${val^^}
AAA

awk⌗

The documentations says that there are functions called tolower and toupper.

# to lower
$ echo AAA | awk '{print tolower($0)}'
aaa

# to upper
$ echo aaa | awk '{print toupper($0)}'
AAA

sed⌗

The documentation says the following:

\L Turn the replacement to lowercase until a \U or \E is found,

\U Turn the replacement to uppercase until a \L or \E is found,

# to lower
$ echo AAA | sed -e 's/\(.*\)/\L\1/'
aaa

# to upper
$ echo aaa | sed -e 's/\(.*\)/\U\1/'
AAA

However, \L and \U cannot be used in BSD sed.

tr⌗

The documentation says the following:

[:lower:] all lower case letters

[:upper:] all upper case letters

# to lower
$ echo AAA | tr '[:upper:]' '[:lower:]'
aaa

# to upper
$ echo aaa | tr '[:lower:]' '[:upper:]'
AAA

The above is how to convert to lowercase and uppercase.

Benchmark⌗

Benchmark the performance of each. Will use hyperfine for benchmarking.

Generate script⌗

To convert strings of various lengths, write the following script to generate a script that executes commands on arbitrary-length strings.

#!/bin/bash

n=${1}

str=$(perl -e "print 'a' x ${n}")

cat <<EOF > bash-n${n}.sh
v=${str}; echo \${v^^}
EOF

cat <<EOF > tr-n${n}.sh
echo ${str} | tr '[:lower:]' '[:upper:]'
EOF

cat <<EOF > sed-n${n}.sh
echo ${str} | sed -e 's/\(.*\)/\U\1/'
EOF

cat <<EOF > gawk-n${n}.sh
echo ${str} | gawk '{print toupper(\$0)}'
EOF

cat <<EOF > mawk-n${n}.sh
echo ${str} | mawk '{print toupper(\$0)}'
EOF

cat <<EOF > benchmark-n${n}.sh
hyperfine --warmup 3 --min-runs 1000 "bash bash-n${n}.sh" "bash tr-n${n}.sh" "bash sed-n${n}.sh" "bash gawk-n${n}.sh" "bash mawk-n${n}.sh" --export-markdown benchmark-result-n${1}.md
EOF

If the above script called gen-scripts.sh, run it as gen-scripts.sh 10 to generate a script that executes commands against aaaaaaaaa(length = 10) and benchmarks them with hyperfine. Since it is unlikely that the results will change between lowercase and uppercase conversions, benchmark only for uppercase conversions.

Results⌗

Benchmarked the string lengths as 10, 100, 1,000, 10,000, 100,000, and 1,000,000. *-n10.sh has a string length of 10, *-n100.sh has a string length of 100, … and so on. The benchmark was done on Ubuntu 20.04.1, but since the first installed awk was mawk, so also measured it with gawk.

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bash bash-n10.sh`	2.2 ± 0.5	1.5	6.3	1.00
`bash tr-n10.sh`	3.1 ± 0.6	2.4	9.6	1.43 ± 0.44
`bash sed-n10.sh`	3.3 ± 0.5	2.7	6.7	1.55 ± 0.44
`bash gawk-n10.sh`	3.9 ± 0.7	3.1	8.5	1.81 ± 0.54
`bash mawk-n10.sh`	3.3 ± 0.6	2.5	6.7	1.52 ± 0.46

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bash bash-n100.sh`	1.6 ± 0.3	1.1	3.3	1.00
`bash tr-n100.sh`	2.6 ± 0.5	1.8	6.1	1.70 ± 0.45
`bash sed-n100.sh`	2.8 ± 0.5	2.2	5.6	1.79 ± 0.46
`bash gawk-n100.sh`	3.4 ± 0.6	2.6	6.8	2.16 ± 0.55
`bash mawk-n100.sh`	2.7 ± 0.5	2.0	6.4	1.70 ± 0.45

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bash bash-n1000.sh`	2.3 ± 0.5	1.6	7.2	1.00
`bash tr-n1000.sh`	3.2 ± 1.3	2.4	19.0	1.44 ± 0.66
`bash sed-n1000.sh`	4.6 ± 2.2	2.8	24.1	2.02 ± 1.08
`bash gawk-n1000.sh`	4.1 ± 0.9	3.1	8.9	1.83 ± 0.57
`bash mawk-n1000.sh`	3.2 ± 0.4	2.6	6.3	1.40 ± 0.37

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bash bash-n10000.sh`	3.1 ± 0.4	2.4	7.3	1.00
`bash tr-n10000.sh`	3.4 ± 0.6	2.6	6.1	1.11 ± 0.25
`bash sed-n10000.sh`	5.7 ± 26.5	3.8	842.7	1.84 ± 8.62
`bash gawk-n10000.sh`	4.4 ± 0.7	3.4	7.6	1.44 ± 0.31
`bash mawk-n10000.sh`	3.4 ± 0.4	2.7	5.5	1.10 ± 0.21

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bash bash-n100000.sh`	17.1 ± 1.4	14.8	29.4	1.63 ± 0.20
`bash tr-n100000.sh`	10.5 ± 1.0	8.9	15.8	1.00
`bash sed-n100000.sh`	21.2 ± 1.6	19.0	39.2	2.02 ± 0.24
`bash gawk-n100000.sh`	12.8 ± 1.0	11.0	19.1	1.22 ± 0.15
`bash mawk-n100000.sh`	11.9 ± 0.7	10.7	18.3	1.13 ± 0.13

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bash bash-n1000000.sh`	144.8 ± 11.2	126.3	206.8	1.89 ± 0.19
`bash tr-n1000000.sh`	77.4 ± 6.9	67.1	143.5	1.01 ± 0.11
`bash sed-n1000000.sh`	172.3 ± 13.6	151.1	276.5	2.24 ± 0.23
`bash gawk-n1000000.sh`	84.3 ± 5.4	75.4	103.7	1.10 ± 0.10
`bash mawk-n1000000.sh`	76.8 ± 5.0	68.8	128.3	1.00

The results are summarized as follow:

For string lengths of 1,000 or less, conversion by bash’s variable expansion is the fastest.
At a string length of 10,000, bash’s variable expansion, tr, and mawk, become almost as fast.
Bash’s variable expansion slows down when the string length 100,000 or higher, and tr, mawk, and gawk are faster.
The performance of tr and mawk will always be about the same.
sed is the slowest in all results.
Perhaps because the process was simple, there was not much difference in performance between gawk and mawk.

Conclusion⌗

This article introduces how to convert to lowercase and uppercase in bash, awk, sed, and tr, and the benchmark results for each. However, since there is only a 2-fold difference in performance between top and bottom, think there is no problem using the command you like as long as it is not executed too often. Also, since CPU and memory usage is not measured in hyperfine, choosing which one to use will change when you consider those factors as well.