Regex limit to match only once

There are times when you want to ensure that a pattern matches only once

In a Site Variation, for example, SiteSpect is normally configured to insert a small script block in the page source to support using EventTrack and Client Side Variations. The insertion point is specified to be </head> by default but could be something like the first <meta.. declaration or any other pattern.

Why be concerned with this?

If the pattern that you are matching on unexpectedly appears more than once in the page source  resulting in the code you are inserting getting inserted there too! In cases of injecting a code block for example, you typically only want it to be injected once. The following are a few actual examples we have encountered.

  • Your insertion point is </head> and your page has some script in it that injects HTML including an </head> tag.
  • Your insertion point is </title> and your page uses inline SVG code that also has <title></title> in it.
  • Your insertion point is </body> and your page has some script in it that injects HTML including an </body> tag.

Note: If there is a possibility of non HTML content such as a JSON response containing HTML that could be matched on, see https://doc.sitespect.com/knowledge/good-practice-with-variation-triggers

Match on only the first pattern

Use a file anchor

Search (start at the absolute start of the page until the first </head>)

\A(.*?)</head>

Replace

$1
<script>
// code..
</script>
</head>

Match on only the last pattern

There are two good ways to do this.

Use a negative look ahead. This specifies by position so no capture group variable is needed.

Search (find </body> only when there is no other </body> after it)

</body>(?!.*</body>)

Replace

<script>
// code..
</script>
</body>


Use a file anchor with a greedy regex

Search (find the last </body> by using a greedy regex)

\A(.*)</body>

Replace

$1
<script>
// code..
</script>
</body>

Match at the absolute end of the page

This is a match by position and requires no capture group or text replacement

Search (use a lower case z for the absolute end of page)

\z

Replace

<script>
// code..
</script>

Note: Putting script code outside of body and html tags is permissible with modern browsers. HTML5 technically does not require body and html tags but looks at the doctype declaration.  Page sources that do not contain these tags will not match with the SiteSpect default insertion point matches and would need to be updated to match on another pattern.

Match only within a section

https://doc.sitespect.com/knowledge/regex-search-and-replace-only-within-a-section-of-html