How to convert to lowercase and uppercase in bash, awk, sed, and, tr and the benchmark results for each
This article introduces how to convert to lowercase and uppercase in bash, awk, sed, and tr, and the benchmark results for each.
Versions⌗
The versions are as follows:
- OS
- Ubuntu 20.04.1 LTS
- bash
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
- gawk
GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
- mawk
1.3.4 20200120
- sed
sed (GNU sed) 4.7
- tr
tr (GNU coreutils) 8.30
Convert to lowercase/uppercase⌗
bash⌗
The documentation says the following:
The ‘^’ operator converts lowercase letters matching pattern to uppercase; the ‘,’ operator converts matching uppercase letters to lowercase. The ‘^^’ and ‘,,’ expansions convert each matched character in the expanded value;
# to lower
$ val="AAA"
$ echo ${val,,}
aaa
# to upper
$ val="aaa"
$ echo ${val^^}
AAA
awk⌗
The documentations says that there are functions called tolower and toupper.
# to lower
$ echo AAA | awk '{print tolower($0)}'
aaa
# to upper
$ echo aaa | awk '{print toupper($0)}'
AAA
sed⌗
The documentation says the following:
\L Turn the replacement to lowercase until a \U or \E is found,
\U Turn the replacement to uppercase until a \L or \E is found,
# to lower
$ echo AAA | sed -e 's/\(.*\)/\L\1/'
aaa
# to upper
$ echo aaa | sed -e 's/\(.*\)/\U\1/'
AAA
However, \L
and \U
cannot be used in BSD sed.
tr⌗
The documentation says the following:
[:lower:] all lower case letters
[:upper:] all upper case letters
# to lower
$ echo AAA | tr '[:upper:]' '[:lower:]'
aaa
# to upper
$ echo aaa | tr '[:lower:]' '[:upper:]'
AAA
The above is how to convert to lowercase and uppercase.
Benchmark⌗
Benchmark the performance of each. Will use hyperfine for benchmarking.
Generate script⌗
To convert strings of various lengths, write the following script to generate a script that executes commands on arbitrary-length strings.
#!/bin/bash
n=${1}
str=$(perl -e "print 'a' x ${n}")
cat <<EOF > bash-n${n}.sh
v=${str}; echo \${v^^}
EOF
cat <<EOF > tr-n${n}.sh
echo ${str} | tr '[:lower:]' '[:upper:]'
EOF
cat <<EOF > sed-n${n}.sh
echo ${str} | sed -e 's/\(.*\)/\U\1/'
EOF
cat <<EOF > gawk-n${n}.sh
echo ${str} | gawk '{print toupper(\$0)}'
EOF
cat <<EOF > mawk-n${n}.sh
echo ${str} | mawk '{print toupper(\$0)}'
EOF
cat <<EOF > benchmark-n${n}.sh
hyperfine --warmup 3 --min-runs 1000 "bash bash-n${n}.sh" "bash tr-n${n}.sh" "bash sed-n${n}.sh" "bash gawk-n${n}.sh" "bash mawk-n${n}.sh" --export-markdown benchmark-result-n${1}.md
EOF
If the above script called gen-scripts.sh
, run it as gen-scripts.sh 10
to generate a script that executes commands against aaaaaaaaa
(length = 10) and benchmarks them with hyperfine. Since it is unlikely that the results will change between lowercase and uppercase conversions, benchmark only for uppercase conversions.
Results⌗
Benchmarked the string lengths as 10, 100, 1,000, 10,000, 100,000, and 1,000,000. *-n10.sh
has a string length of 10, *-n100.sh
has a string length of 100, … and so on.
The benchmark was done on Ubuntu 20.04.1, but since the first installed awk was mawk, so also measured it with gawk.
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bash bash-n10.sh | 2.2 ± 0.5 | 1.5 | 6.3 | 1.00 |
bash tr-n10.sh | 3.1 ± 0.6 | 2.4 | 9.6 | 1.43 ± 0.44 |
bash sed-n10.sh | 3.3 ± 0.5 | 2.7 | 6.7 | 1.55 ± 0.44 |
bash gawk-n10.sh | 3.9 ± 0.7 | 3.1 | 8.5 | 1.81 ± 0.54 |
bash mawk-n10.sh | 3.3 ± 0.6 | 2.5 | 6.7 | 1.52 ± 0.46 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bash bash-n100.sh | 1.6 ± 0.3 | 1.1 | 3.3 | 1.00 |
bash tr-n100.sh | 2.6 ± 0.5 | 1.8 | 6.1 | 1.70 ± 0.45 |
bash sed-n100.sh | 2.8 ± 0.5 | 2.2 | 5.6 | 1.79 ± 0.46 |
bash gawk-n100.sh | 3.4 ± 0.6 | 2.6 | 6.8 | 2.16 ± 0.55 |
bash mawk-n100.sh | 2.7 ± 0.5 | 2.0 | 6.4 | 1.70 ± 0.45 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bash bash-n1000.sh | 2.3 ± 0.5 | 1.6 | 7.2 | 1.00 |
bash tr-n1000.sh | 3.2 ± 1.3 | 2.4 | 19.0 | 1.44 ± 0.66 |
bash sed-n1000.sh | 4.6 ± 2.2 | 2.8 | 24.1 | 2.02 ± 1.08 |
bash gawk-n1000.sh | 4.1 ± 0.9 | 3.1 | 8.9 | 1.83 ± 0.57 |
bash mawk-n1000.sh | 3.2 ± 0.4 | 2.6 | 6.3 | 1.40 ± 0.37 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bash bash-n10000.sh | 3.1 ± 0.4 | 2.4 | 7.3 | 1.00 |
bash tr-n10000.sh | 3.4 ± 0.6 | 2.6 | 6.1 | 1.11 ± 0.25 |
bash sed-n10000.sh | 5.7 ± 26.5 | 3.8 | 842.7 | 1.84 ± 8.62 |
bash gawk-n10000.sh | 4.4 ± 0.7 | 3.4 | 7.6 | 1.44 ± 0.31 |
bash mawk-n10000.sh | 3.4 ± 0.4 | 2.7 | 5.5 | 1.10 ± 0.21 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bash bash-n100000.sh | 17.1 ± 1.4 | 14.8 | 29.4 | 1.63 ± 0.20 |
bash tr-n100000.sh | 10.5 ± 1.0 | 8.9 | 15.8 | 1.00 |
bash sed-n100000.sh | 21.2 ± 1.6 | 19.0 | 39.2 | 2.02 ± 0.24 |
bash gawk-n100000.sh | 12.8 ± 1.0 | 11.0 | 19.1 | 1.22 ± 0.15 |
bash mawk-n100000.sh | 11.9 ± 0.7 | 10.7 | 18.3 | 1.13 ± 0.13 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bash bash-n1000000.sh | 144.8 ± 11.2 | 126.3 | 206.8 | 1.89 ± 0.19 |
bash tr-n1000000.sh | 77.4 ± 6.9 | 67.1 | 143.5 | 1.01 ± 0.11 |
bash sed-n1000000.sh | 172.3 ± 13.6 | 151.1 | 276.5 | 2.24 ± 0.23 |
bash gawk-n1000000.sh | 84.3 ± 5.4 | 75.4 | 103.7 | 1.10 ± 0.10 |
bash mawk-n1000000.sh | 76.8 ± 5.0 | 68.8 | 128.3 | 1.00 |
The results are summarized as follow:
- For string lengths of 1,000 or less, conversion by bash’s variable expansion is the fastest.
- At a string length of 10,000, bash’s variable expansion, tr, and mawk, become almost as fast.
- Bash’s variable expansion slows down when the string length 100,000 or higher, and tr, mawk, and gawk are faster.
- The performance of tr and mawk will always be about the same.
- sed is the slowest in all results.
- Perhaps because the process was simple, there was not much difference in performance between gawk and mawk.
Conclusion⌗
This article introduces how to convert to lowercase and uppercase in bash, awk, sed, and tr, and the benchmark results for each. However, since there is only a 2-fold difference in performance between top and bottom, think there is no problem using the command you like as long as it is not executed too often. Also, since CPU and memory usage is not measured in hyperfine, choosing which one to use will change when you consider those factors as well.