Forcing backreference on regex

Posted on

Question :

These days ago I asked here about a regex that validates dates and how to force the separators regex validated the following format: dd / mm / yyyy

So based on it I was trying to force the separators using backreference in a regex that validates the formats: yyyy / mm / dd but I’m not getting anyone could explain me how to find the values for the backreference and do that? The regex that I need to force the backreferences that validates YYYY / mm / dd is this.



Answer :


My answer complements Paulo RF Amorim answer, showing how what he said applies to Regex Expression placed in the question.

See that in a Regex you create the Groups through the parentheses, what is inside parentheses will be captured by the group; and, each group has a numeric id so we can reference it. The id 0 matches the entire regex match; we can know the ids of Groups within the regex expression by looking at the order from left to right in which the parentheses are opened:

In Regex a(b|c)(apartamento|ca(sa|rro)) :

  • Group of id 1 is (b|c) that will return b or c ;
  • Group of id 2 is (apartamento|ca(sa|rro)) which will return apartamento or casa or carro ;
  • The group of id 3 is (sa|rro) which will return sa or rro (or it will be undefined if the 2 group contains apartamento ) li>

    If you want a Group not to have a reference, that is, an id, you can use (?:) which creates a non-capturing group , explained this answer of SOen.

    Your Regex starts like this: (^(?:d{4}([-/.])... See that ([-/.]) Group has 2 , because before it we have a non-capturing group (?:d{4}... and an id group 1 that is opened by the first parentheses (^... .

    Date separator can be obtained by id 2 which is the group reference ” ([-/.]) “, which will return - or / or . . To refer to this group you just have to as @Paulo explained.

    Currently your Regex repeatedly displays Groups equal to ” ([-/.]) “, we can simply keep the first of these (which has 2 , as I explained above) and replace the others with its reference that is ; I decided to keep the reference within Groups to facilitate, so that we have a Before / After like this:

    Before: (^(?:d{4}([-/.])...([-/.])...([-/.])...([-/.])...
    Then: (^(?:d{4}([-/.])...()...()...()...

    With this, Regex will only match if all the tabs are the same as the one captured by the% wc group, so we only validate dates that do not mix different tabs. Then there will be match on a date like 2 or 2000/02/28 , but there will be no match on a date like 2000-02-28 .

    In the end, your Regex looks like this:


    Note that at the end of the expression I used a reference to the% id% group instead of the id% group: 2000/02-28 ; this was necessary due to the specific features of its Regex, which gives a different treatment for the 29th day of the month February. This is clearer in Debuggex :

    See in the image that it would not work to make 7 ( 2 ) reference ...()29)$) because Group 9 will be undefined when the date is February 29, but , we make Ref 4 refer Group 2 which will not be undefined on the dates of February 29.


  • The idea of the backreference in Regex is that you can reuse a block through the group created by it. For example:

    In this regex ([a-c])xx , the following block ([a-c]) is defined as group 1, which defines what is the letter “a”, “b” or “c”.

    Then the following words “axaxa”, “bxbxb” or “cxcxc” are valid as explained in this link . The link site is a useful tool for identifying groups and validating regex.


    Leave a Reply

    Your email address will not be published. Required fields are marked *