Harnassing the power of regex with re-seq
This week I started doing Advent of Code for the first time ever. I looked at it last year, but felt intimidated by it, along with being really busy having 5 odd jobs at the time. This year Clean Coders has me pretty well-equipped to tackle the problems and actually have fun doing them, so I’ve been waking up an hour earlier to get part 1 knocked out, and doing part 2 before bed every night. I’ve been really enjoying it because some of the puzzles force you out of your comfort zone in coding a little bit.
Yesterday’s problem required me to get a little more comfortable with regular expressions (regex). While I’m still no expert on the subject, it definitely opened taught me quite a bit more than I knew before. While using Clojure, one function seemed to help more than ever while using regex, so I figured it’s worth a blog post about.
What is re-seq?
re-seq
is a function in Clojure that scans a string for matches to a given regular expression and returns them as a
lazy sequence. Its simplicity and flexibility make it a go-to tool for working with text.
(re-seq regex-pattern string)
- regex-pattern: A regular expression, defined using Clojure’s #” syntax or java.util.regex.Pattern.
- string: The input string to search.
The result is a lazy sequence of matches.
How Does re-seq Work?
When re-seq processes a string, it looks for all substrings that match the given pattern. If the pattern contains capturing groups, re-seq returns a sequence of vectors, where the first element is the full match and subsequent elements are the contents of the capturing groups.
Examples of re-seq in Action
1. Basic Matching
Let’s start with a simple example: finding all the digits in a string.
(re-seq #"\d" "a1b2c3")
;; => ("1" "2" "3")
2. Using Capturing Groups
Capturing groups let you extract specific parts of a match. For example, extracting key-value pairs:
(re-seq #"(\w+):(\d+)" "name:42 age:30")
;; => (["name:42" "name" "42"] ["age:30" "age" "30"])
Here, each match is a vector:
- The first element is the full match (“name:42”, “age:30”).
- The second and third elements are the contents of the capturing groups (“name” and “42”, etc.).
3. Working Without Capturing Groups
If the pattern doesn’t use capturing groups, re-seq returns the full matches as strings.
(re-seq #"\w+" "Clojure is fun!")
;; => ("Clojure" "is" "fun")
Tips and Tricks
- Use Capturing Groups Wisely: If you don’t need capturing groups, avoid them to keep results simpler.
- Combine with Higher-Order Functions: map, filter, and reduce pair beautifully with re-seq.
- Escape Special Characters: Regular expressions have special characters (e.g., ., *, +). Escape them with \ when needed.