Postgres regexp_replace() function
Replace substrings matching a regular expression pattern
The Postgres regexp_replace()
function replaces substrings that match a regular expression pattern with the specified replacement string.
This function is particularly useful for complex string manipulations, and data cleaning/formatting tasks. Consider scenarios where you'd want to remove or replace specific patterns in text or transform data to meet certain requirements. For instance, you might use it to format phone numbers consistently, remove HTML tags from text, or anonymize sensitive information in logs.
Function signature
The regexp_replace()
function has the following syntax:
source
: The input string to perform replacements on.pattern
: The regular expression pattern to match.replacement
: The string to replace matched substrings with.flags
(optional): A string of one or more single-letter flags that modify how the regex is interpreted.
It returns the input string with occurrence(s) of the pattern replaced by the replacement string.
More recent versions of Postgres (starting with Postgres 16) also support additional parameters to further control the replacement operation:
- start: The position in the source string to start searching for matches (default is 1).
- N: If specified, only the Nth occurrence of the pattern is replaced. If N is 0, or the
g
flag is used, all occurrences are replaced.
Example usage
Consider a customer_data
table with a phone_number
column containing phone numbers in different formats. We can use regexp_replace()
to standardize these numbers to a consistent format.
This query removes all non-digit characters from the phone numbers, standardizing them to a simple string of digits.
Advanced examples
regexp_replace()
with backreferences
Use You can use backreferences in the replacement string to include parts of the matched pattern in the replacement.
This query anonymizes email addresses in log entries by replacing them with [REDACTED] while preserving the rest of the log structure.
regexp_replace()
using flags
Modify the behavior of The flags
parameter allows you to modify how the function operates. Common flags include:
g
: Global replacement (replace all occurrences)i
: Case-insensitive matchingn
: Newline-sensitive matching
This query replaces all occurrences of "apple" (case-insensitive) with "pear" in the product descriptions.
regexp_replace()
for complex pattern matching and replacement
Use regexp_replace()
can handle complex patterns for sophisticated text processing tasks. For example, the query below removes all HTML tags from the given markup, producing plain text.
This query produces the following output:
Additional considerations
Performance implications
While regexp_replace()
is powerful, complex regular expressions or operations on large text fields can be computationally expensive. For frequently used operations, consider preprocessing the data or using simpler string functions if possible.
Alternative functions
replace()
: A simpler function for straightforward string replacements without regular expressions.translate()
: Useful for character-by-character replacements.regexp_matches()
: Returns an array of all substrings matching a regular expression pattern, which can be useful in conjunction with other functions for complex transformations.
Resources
Last updated on