Regular Expression Performance Comparison

正規表現の性能比較


The following tables provide comparisons between the following regular expression libraries:

The Boost regex library.
Chiyuki regex library.

Details

Machine: Intel Core i7 mobile 2.80GHz.

Compiler: GNU C++ version 4.9.2

OS: Ubuntu 14.04 LTS.

Boost version: 1.58.0.

Chiyuki version: 1.0.0. Stable (Updated on 17th July 2015)


As ever care should be taken in interpreting the results, only sensible regular expressions (rather than pathological cases) are given, most are taken from the Boost regex examples, or from the Library of Regular Expressions.


Compile option

Boost Regex: default compile by bjam

Chiyuki Regex: -std=c++11 -O2 -finline-functions -march=native


Averages

The following are the average relative match time percentage for all the tests: the better regular expression library would score 1, in practice anything less than 2 is pretty good. (benchmark: Boost Regex 100%)

Except the last test case in the HTML Document Search

Boost Regex Chiyuki Regex
100.0%
(1.30)
112.1%
(1.16)

Comparison:  Pattern Matches  (Weights: 0.5)

For each of the following regular expressions the time taken to match against the text indicated was measured.

Expression Text Boost Regex Chiyuki Regex
abcdefghijklnmopqrstuvwxyz abcdefghijklnmopqrstuvwxyz 3.26
(2.32e-07s)
1
(1.02e-07s)
^(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]{1,2})(\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]{1,2})){3}$ 191.236.34.27 1.74
(1.06e-06s)
1
(6.05e-07s)
^(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]{1,2})(\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]{1,2})){3}$ 222.34.4.5 1.23
(9.40e-07s)
1
(7.65e-07s)
(\d{4}[- ]){3}\d{3,4} 1234-5678-1234-456 1.56
(4.84e-07s)
1
(3.11e-07s)
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ john@johnmaddock.co.uk 1.06
(9.86e-07s)
1
(9.34e-07s)
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ foo12@foo.edu 1.03
(7.84e-07s)
1
(7.58e-07s)
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ bob.smith@foo.tv 1
(7.82e-07s)
1.01
(7.90e-07s)
(\w+):\/\/([^\/:]+)(:\d*)?([^# ]*) http://www.trilines.com:80/scripting/default.htm 1.41
(1.55e-06s)
1
(1.11e-06s)
(\w+):\/\/([^\/:]+)(:\d*)?([^# ]*) http://www.thelinuxshow.com/main.php3 1.41
(1.25e-06s)
1
(8.87e-07s)
^\d{1,2}/\d{1,2}/\d{4}$ 4/1/2001 1.19
(2.84e-07s)
1
(2.38e-07s)
^\d{1,2}/\d{1,2}/\d{4}$ 12/12/2001 1.35
(2.79e-07s)
1
(2.07e-07s)
^[-+]?\d*\.?\d*$ 1234 1.15
(3.19e-07s)
1
(2.78e-07s)
^[-+]?\d*\.?\d*$ +3.1415926 1
(3.65e-07s)
1.23
(4.49e-07s)

Comparison:  HTML Document Search  (Weights: 1)

For each of the following regular expressions the time taken to find all occurrences of the expression within the home page file from Ubuntu.com was measured.

P.S: The last test case for Boost Regex involves frequently string object swap and takes lots of time to search the last text.

Expression Boost Regex Chiyuki Regex
<h[12345678][^>]*>.*?</h[12345678]> 1.12
(8.27e-04s)
1
(7.36e-04s)
<p>.*?</p> 1
(3.24e-04s)
2.24
(7.26e-04s)
<img[^>]+src=("[^"]*"|\S+)[^>]*> 1
(5.81e-04s)
1.24
(7.19e-04s)
<a[^>]+href=("[^"]*"|\S+)[^>]*> 14.9
(1.29e-02s)
1
(8.62e-04s)

Chiyuki | Home
Hingchung Chu | Trilines 2015