Skip to content
Snippets Groups Projects
Select Git revision
  • e64f373efe570576d850a06abe1a3809355022a6
  • master default protected
  • 1.31
  • 4.35.0
  • 4.34.1
  • 4.34.0
  • 4.33.1
  • 4.33.0
  • 4.32.2
  • 4.32.1
  • 4.32.0
  • 4.31.0
  • 4.30.1
  • 4.30.0
  • 4.29.1
  • 4.29.0
  • 4.28.0
  • 4.27.0
  • 4.26.0
  • 4.25.5
  • 4.25.4
  • 4.25.3
  • 4.25.2
23 results

monster.mjs

Blame
  • README.md 9.03 KiB

    regexp2 - full featured regular expressions for Go

    Regexp2 is a feature-rich RegExp engine for Go. It doesn't have constant time guarantees like the built-in regexp package, but it allows backtracking and is compatible with Perl5 and .NET. You'll likely be better off with the RE2 engine from the regexp package and should only use this if you need to write very complex patterns or require compatibility with .NET.

    Basis of the engine

    The engine is ported from the .NET framework's System.Text.RegularExpressions.Regex engine. That engine was open sourced in 2015 under the MIT license. There are some fundamental differences between .NET strings and Go strings that required a bit of borrowing from the Go framework regex engine as well. I cleaned up a couple of the dirtier bits during the port (regexcharclass.cs was terrible), but the parse tree, code emmitted, and therefore patterns matched should be identical.

    Installing

    This is a go-gettable library, so install is easy:

    go get github.com/dlclark/regexp2/...

    Usage

    Usage is similar to the Go regexp package. Just like in regexp, you start by converting a regex into a state machine via the Compile or MustCompile methods. They ultimately do the same thing, but MustCompile will panic if the regex is invalid. You can then use the provided Regexp struct to find matches repeatedly. A Regexp struct is safe to use across goroutines.

    re := regexp2.MustCompile(`Your pattern`, 0)
    if isMatch, _ := re.MatchString(`Something to match`); isMatch {
        //do something
    }

    The only error that the *Match* methods should return is a Timeout if you set the re.MatchTimeout field. Any other error is a bug in the regexp2 package. If you need more details about capture groups in a match then use the FindStringMatch method, like so:

    if m, _ := re.FindStringMatch(`Something to match`); m != nil {
        // the whole match is always group 0
        fmt.Printf("Group 0: %v\n", m.String())
    
        // you can get all the groups too
        gps := m.Groups()
    
        // a group can be captured multiple times, so each cap is separately addressable
        fmt.Printf("Group 1, first capture", gps[1].Captures[0].String())
        fmt.Printf("Group 1, second capture", gps[1].Captures[1].String())
    }

    Group 0 is embedded in the Match. Group 0 is an automatically-assigned group that encompasses the whole pattern. This means that m.String() is the same as m.Group.String() and m.Groups()[0].String()

    The last capture is embedded in each group, so g.String() will return the same thing as g.Capture.String() and g.Captures[len(g.Captures)-1].String().

    If you want to find multiple matches from a single input string you should use the FindNextMatch method. For example, to implement a function similar to regexp.FindAllString:

    func regexp2FindAllString(re *regexp2.Regexp, s string) []string {
    	var matches []string
    	m, _ := re.FindStringMatch(s)
    	for m != nil {
    		matches = append(matches, m.String())
    		m, _ = re.FindNextMatch(m)
    	}
    	return matches
    }

    FindNextMatch is optmized so that it re-uses the underlying string/rune slice.

    The internals of regexp2 always operate on []rune so Index and Length data in a Match always reference a position in runes rather than bytes (even if the input was given as a string). This is a dramatic difference between regexp and regexp2. It's advisable to use the provided String() methods to avoid having to work with indices.

    Compare regexp and regexp2

    Category regexp regexp2
    Catastrophic backtracking possible no, constant execution time guarantees yes, if your pattern is at risk you can use the re.MatchTimeout field
    Python-style capture groups (?P<name>re) yes no (yes in RE2 compat mode)
    .NET-style capture groups (?<name>re) or (?'name're) no yes
    comments (?#comment) no yes
    branch numbering reset (?|a|b) no no
    possessive match (?>re) no yes
    positive lookahead (?=re) no yes
    negative lookahead (?!re) no yes
    positive lookbehind (?<=re) no yes
    negative lookbehind (?<!re) no yes
    back reference \1 no yes
    named back reference \k'name' no yes
    named ascii character class [[:foo:]] yes no (yes in RE2 compat mode)
    conditionals (?(expr)yes|no) no yes