Removing Record Redundancy with Sum and Refinement Types
Systems should initiate simplification by removing common duplications like records that only differ in their name, particular types already created as a universal definition, and runtime memory waste. The engineer should probably leverage algebraic and refinement types with composition for an optimal design.
Physical Redundancy
Creating repeating structures is a design flaw that should probably be removed via expression simplification.
From the definitions line shapes, I’ve been writing about a problem, so we have something like the following design example:
I’ll omit the fact that Line
is not a partition1 and just focus on the
repetition of the way the records were defined here.
We have the variants HSegment
and VSegment
sharing exactly the same data
representation. Therefore, this design is flawed since repetition is a common
trait of bad programs —mostly in domain-specific programs where expertise is the
key to reaching the engineering grade.
In the above case, we notice (as boilerplate) a physical redundancy since two records are defined the same way, so as a software engineer when reading that code, you must obviously see it factorizable to simplify the code.
Factorizing the Duplication with Sum Types
When something hard/physical is duplicated (thus inefficient), a common insight as an engineer is to create relations to simplify the problem. Sum types are relations since they create the disjoint union of variants, while product types or records are hard because they define the actual data to be held.
I can state a design fixing some of the issues described before. I’ll take the geometric language design I gave in Designing the Angle Geometry for an Oriented Segment as a base. The repetition will be removed by softening the problem via a sum type.
For a given concept, we can have a main definition out of all the equivalent ones in our domain2.
Then, we won’t have the overlap problem I stated in other articles, where
Segment
“eats” the other variants when defined as a sum type. So, notice I
broke the flawed Line
sum type, and now Segment
is
directly a product type.
For an API for segments with common orientations, an equivalent definition is what I did to fix the design issues of overlapping. You should immediately notice the sum type needed is for the orientation variants (i.e., horizontal or vertical).
The OrientedSegment
is an equivalent definition of Segment
storing the segment angle and ball in \(R^2\) for its length. Both are
isomorphic types.
I’m also taking the exact (-90, 90]deg
angle from the DSL to populate the
whole plane, so any possible line segment can be defined via
OrientedSegment
in only one way.
Any segment can be described with either data type I designed, and they don’t
overlap since they’re equivalent/independent definitions instead of belonging to
the same Line
sum type given in Physical Redundancy.
The sum type removing the redundancy of the previous HSegment
and
VSegment
records is QuadrantalOrientation
. That’s what I mean by
factorizing expressions to simplify them.
The introduction of a sum type for the segment orientation eliminated the two
redundant records for horizontal and vertical variants by factorizing them
from two redundant records into a cohesive OrientedSegment
record with sum
type variants.
Enriching the Domain Types with Refinements
One important case when redundancy is shown is when we have a superset already but want a smaller subset to create a specific type.
From the above example of the Line
sum type, we can
see that Segment
covers all the other possible options, so we don’t need to
over-engineer this with a sum type but create a simple subset or refinement type
of the universe set.
From the simplification above
that removed the duplication, we have our universe set as
OrientedSegment
since we want to take subsets like HSegment
and
VSegment
without creating redundant types. The simplification above is a
correct step to perform, but something else is still missing to fix the Line
design since we no longer have the HSegment
and VSegment
types.
Notice a sum type is a universe consisting of a partition of its subsets3 or variants, while a refinement type is a subset of some universe type. In both, it’s essential to know well the universe type to proceed with a design. You cannot create refinements of a variant since they’re not types45. They’re different abstractions, so use them wisely.
First, you have to notice you have a universe set. Then you’ll want specific types for important subsets, so you know you should create refinements reusing the universe type instead of creating more types again from scratch that’ll pollute the code with duplication. Therefore, we have a cohesive system, eliminating design duplications, yet a powerful type system to cover all the domain needs.
Removing Field Redundancy for Memory Optimization
The refinement applied to OrientedSegment
is a neat solution, but more
redundancy complications can arise when the data representation of the
refinement types gets simplified down in structure, thus leading to duplicated
fields created at runtime.
If we create the refinements as suggested
above, since each subtype is a
refinement of OrientedSegment
, they will store a redundant field for the
orientation
, thus inefficient in memory.
We know they’re unnecessary from the very refinement definition: A HSegment
will always be horizontal, etc. That information is already part of the type
system, so it’s redundant to store the orientation
field.
Since we’re in FP, we can trivially compose what we need. So, as I said above,
the OrientedSegment
consists of a ball that gives its radius and center point.
Therefore, we can define the data type for what we need, and exclude the redundant part that is now part of the type system.
Now that the redundant orientation
field is removed, we can define the
refinements in a memory-optimal layout.
Now, as an engineer, you have two options:
- Include
Ball
in the previous definitions, which might be over-engineering. - Keep the redundant fields between
OrientedSegment
andBall
in the design to trade the memory duplication off at runtime, although much better than the problem of Physical Redundancy.
Recall that records with the same fields in the same package are disallowed because (that’s redundant, thus inefficient) the compiler generates accessor functions with the names of the record fields.
Storing repeated values at runtime is also a Physical Redundancy. Hence, we have a tight relation between the product types we design and the physical data that occupies the hardware.
The Haskell compiler is a great ally when spotting “code smells” that are just bypassed in ordinary languages. Of course, it all comes from mathematics instead of made-up C++ or JS tricks; hence it’s so good.
So, you have to decide the tradeoff as the designer. Recall that the former solution is more mathematical, while the latter focuses on implementation details.
It’d be great if a compiler generated the refinements properly. For example, the
refinement made before should
not store the oriented
field at runtime, provided the compiler could infer
this valuable6 information. That’s what I mean by “implementation detail.”
When you deal with them, tradeoffs come across, and the math or high-level
concepts fade away 😐.
The previous math-like refinement could still be optimized not to remove a design redundancy but a memory one, meaning an implementation detail. As an engineer, you must understand your domain and technical tradeoffs since compilers, even the Haskell one, are far from perfect to give us the perfect solution to every situation.
Principles for Engineers
Simplifying expressions in functional programs is a basic/essential skill any software engineer must have since it’s how you were trained in the CS/math majors.
So, instead of simplifying polynomials and such with set theory, you now have to simplify programming expressions with type theory. That’s how you apply abstract skills developed during math courses, mostly if you’re self-taught since that shows you’re ahead.
Applying simplification techniques is mandatory for any engineer, regardless of your domain, whether it is mathematical software or not. If you work in other domains, you’ll always have expressions to simplify building the “business/app logic” into engineering-grade artifacts. Programmers and developers are not engineers because they don’t engineer anything and don’t care about the engineering grade. It doesn’t matter how they get called in their tech jobs.
These skills primarily concern engineers rather than apparently similar roles lacking rigor, like software development or programming.
Fixing Redundancy is a Major Key to Simplify Systems
Removing design flaws showing redundancy, or boilerplate, is mandatory for crafting engineering-grade software since the existence of repetition in the codebase suggests we need to apply simplification techniques like polymorphism to our expressions.
If we don’t simplify the code, we’ll have a far-from-correct design, resulting hard to test, document, etc. Therefore, it’s not an engineering-grade design.
The inefficient hard duplication in a program, like records that only differ in their name, can be simplified by factorizing the duplication into a cohesive unit with sum-type variants.
Refinement types are a complex feature that barely a few languages like Haskell extensions can support, if at all. They’re powerful abstractions since we can express the very same math definitions. Notice how types are the sets of set theory while subtypes are then subsets of set theory.
When tools like compilers are unable to help anymore, we have to “get our hands dirty” with implementation details, bringing tradeoffs with lower-level code. Therefore, you apply the “engineer” part of your “math software engineer” title.
It’s essential to have a strong mathematical and computer science background for being a software engineer.
The functional type system allows us to create simple expressions for our programs via abstractions like algebraic and refinement types, which are all about pure functions, thus functional programming.
FP also includes the concepts of types analogously to mathematical sets. So, we can have further abstractions like isomorphic types to enrich our domain with equivalent definitions of your choice.
-
Line is a sum of three product types with overlap (i.e.,
HSegment
andVSegment
are a refinement ofSegment
, which is redundant inLine
) ↩ -
This is like mathematical equivalent definitions where you prove the double implication among all of them, so you choose one at your will that makes the most sense for the underlying application ↩
-
So you can optimize for one of the disjoint subsets of your choice ↩
-
The type in the example above is
Line
, notSegment
, so you cannot refineSegment
with the given ADT ↩ -
In Java, you can because of what I said in my other article’s footnote, but of course, don’t do it ↩
-
Pun intended since
oriented
is a redundant “value” at runtime, but I also suggest it’s “valuable” information for the compiler to infer ↩