Overriding Hash Code and Equals in a Multi-Representation Record
Certain models, like email addresses, can hold multiple alternatives or synonyms of the same model to provide different representations but the same functionality, thus leading to noise that must be filtered out for functionality purposes.
Multiple Email Representations
An email model consists of a local name followed by an @ symbol and a domain name. While this is pretty simple, they might also contain noise like dots and + symbols for various purposes.
For example, joedoe@place.com
is a canonical or base form of an email while
accepting the (infinitely many)
variants joe.doe@place.com
, joe.doe+aksfnsfs@place.com
, etc.
In programming terms, an email model can be a record with multiple representations. That is different email values for readability or testing purposes but corresponding to the same functionality. Therefore, emails can be equal (or not) depending on your abstraction.
Email Type
I created the Email
record in Kotlin, thus using a data class
and
overriding toString
, hashCode
, and equals
.
The function normalizedLocal
will provide the base form of the local
field to remove the redundancy, so enabling uniqueness for that value.
The equals
implementation matches the other
generic object to an Email
type to check the rest of the field matching. This would be done via
instanceof
(JDK16) in Java. The hashCode
equals the normalized (main form
without repetition or canonical) email value.
Implementing hashCode
is a key (pun intended) for discerning among Email
objects, like in a Set
or Map
.
The hashCode
and equals
methods have to be overridden in this case since
the Email
type has many representations of the same model, so all the
redundant emails boil down to the main form and then compare for equality.
Email Normalization Definitions
Important definitions are required for finishing the previous Email
implementation. They
regard the definitions given first for
dots (.) and plus (+) symbols.
This way, normalizedLocalDot
takes care of any dot by removing
it, normalizedLocalPlus
filters out anything after any plus symbol,
and normalizedLocal
composes both.
Email Uniqueness Challenge
This problem gives you a list of strings supposed to be email addresses with the dot and plus constraints defined before. You have to return the number of unique emails in the list.
These kinds of toy (interview) problems don’t care much about realistic requirements. For example, you can pass the tests even if the email is invalid, but the count “passes.” They’re probably also full of imperative approaches that are hard to maintain with real conditions.
By working out the subproblems declaratively with mathematical definitions, you will scale a well-defined domain supporting any kind of requirements.
First, I needed to define a language to match any valid email address.
The regex captures two groups, local
and domain
, for matching expressions.
The set of accepted inputs is the language. Of course, the language just
defined is that of all valid emails we required above.
Notice the email language defined by the regex might be actually integrated into
the Email
type for building a DSL, for example, by using refinements.
Finally, checking redundancy can boil down to counting a Set
.
The solution maps the generic email list to matching expressions, representing
strings belonging to the email language defined by the regex. By destructuring
the two groups, it maps the original String
to the Email
domain type.
Subsequently, it converts the Email
list to a Set
to eliminate redundant
entries, thus resolving the count required for uniqueness. This works
because Email
already has the implementation for equality.
Testing Email Values
I generated and reviewed a bunch of tests to check my code.
With the given test suite, the uniqueEmailsNum
function can be checked for
many cases.
Reducing Multiple Representations to the Main One
Definitions can allow multiple representations of the same model, while the main form is clean without repetitions. Alternative forms can be reduced to the main one, simplifying the problems required to solve.
An email address is one example of a model that can hold infinitely many forms that point to the same address or owner.
Addressing these simplifications will often lead to a declarative mathematical
approach with engineering standards like code maintenance and scalability. The
approach given for an Email
type can be further worked out to build a DSL.
One declarative approach to define the language of all emails is employing regular expressions —another formal concept— thus enriching the program capabilities by accurately refining our domain.
The hash value of equal objects must match to keep consistency in object equality and hash-based data structures. Therefore, the redundant alternatives are filtered out for implementing equality and making them equal to the main form.