Building Slides from Screenshots App in JavaFX
Efficient and quality engineering communication is a daily challenge for creating value such as specs, documentation, content, or visualizations, which are essential for professional and educative standards, which I started materializing in JavaFX via graphical slides for building domain-specific presentations based on screenshots, code snippets, and basic shapes, resulting in a blog with example project providing an in-depth development and documentation with advanced JavaFX and AI features for automation to conquer the set goals.
Sharing a Story from Screenshots
I wanted to build a little presentation from my daily work, so I took the screenshots to work them out in Photopea1.
The idea was to build a carousel presentation like this one, which I was editing in this PSD file. It still didn’t include captions, so besides the editing, that puts more off-topic work for me.
Needless to say, those apps like Photoshop, Google Docs, Office, etc., are general-purpose (i.e., mediocre) and manual. Plus, you even have to pay a subscription or watch ads for a monolith you will barely use much.
I need composition, I just engineer my domain-specific systems.
I lately took this experience as the final motivation to start materializing the automation of these systems without requiring the monolithic products mentioned. I’m working this materialization out via my new —and first blog with project— EP: Slides, here in this article.
These general-purpose software out there cannot be composed.
Even if they have (bloated) AI assistance, macros, or even APIs (if at all), they’re just products for profit. Photoshop can paint an image, but it’ll never understand source code or your specific system. M$ Word or LaTeX will never understand what an equation means, as in mathematics, etc.
They need bloated features like AI that take huge deep learning models to develop and train and have heavy licensing and marketing because they’re general purpose. For instance, products that suck need capitalism (marketing) to sell them, but they’re not the solution to the problems.
If you’re a domain expert, you simplify matters to the specific domain, and the more you simplify, the less bloated AI, marketing, analytics and engineering you have because you address the problems instead of the symptoms.
AI can have its place to automate external systems which I wanted to leave clear in this development, but it’ll never replace the underlying domain as they’re independent tools.
That is, you need to understand how AI should be useful to automate works for our domain language instead of buying mundane general-purpose software that uses AI as magic. Notice the difference between automation and magic.
For example, grammar checkers are not technical, so you get a lot of errors
marked because they don’t understand computer and math languages and idioms.
They only want you to pay a subscription to fix all the “issues”
they sell you. Their dream is to remove all the “issues 💸,” so your original
tone even changes, and ends up sounding like a robotic agent or someone else.
They won’t even tell you the real enhancements of your technical text you should be fixing. They’re not engineering grade. They tell you to remove something for “clarity issues,” then it tells you to add it back again 🤯 for correctness.
In the end, you have to be proficient in English (i.e., the domain) to know what you’re doing because any “magic” tool was made just for the sake of a profit agenda, thus bloated under the hood. You have to compose your tools instead of paying for generic ones that turn into workarounds and will never return most of the investment.
Another clear example is ChatGPT, which can generate mundane Python or any popular mainstream language or framework code but struggles with ultra-niche technologies like Purescript or even JavaFX. It makes up unexisting APIs and code that wouldn’t compile. These unpopular techs require domain expertise, and there will be (hopefully) never enough data to train or fine-tune those ultra-bloated magic-based (and unethical with IP legal issues) models with technology that requires actual engineering.
In the end, you must be proficient in your domain and realize that tools like ChatGPT are nothing but a faster way to automate what you otherwise have to search in Google results.
General-purpose software is useful but not engineering-grade. On the other hand, mathematical software must be engineering-grade by nature.
You should refrain from saying, “It was
generated by AI” when, in fact, the
reality of how AI should work comes to “It was automated by AI.” The exception
would be only when the whole tool (as a final product) is named after AI, e.g.,
“AI generated image.”
There are likely possible ways to make AI work for our domain language (as said above).
I figured out one simple way to leverage an AI application for this project, so it serves as a good example.
Developing anything domain-specific here is natural but not the objective, though. Thus, the purpose is to blog a new example project to start taking action, devising standards, and proving concepts.
Recall the domain engineering automation concepts of what this comes about even though my purpose this time is to develop these ideas conservatively in Java as an example project.
Domain Composition Versus Magic
It struck me as funny (after writing Domain Engineering) when I read “Adobe Photoshop API magic, now available in the cloud” 😂 on the Photoshop developer page. It tells you to remove background via API, but you actually need composition, like in FP. You should “compose backgrounds” instead of removing them from a binary image. That contrasts a simple solution for the former versus a complicated one for the latter.
Although you can compose layers in Photoshop or certain functionalities in general-purpose software, you just can’t take some 10% of the monolith and pay 10% of the price to actually compose it with a totally different domain like math or programming. As said before, it won’t return most of the investment since general-purpose software is not composable.
Hence, as math software engineers, we must compromise to build composable software to remove both needs for “magic” and for duplicating expensive-to-engineer features.
The application consists of many advanced features in a JavaFX master-view-detail desktop implementation for creating slide images and presentations based on images like screenshots, and code snippets. The types of artifacts generated are depicted here, so it’s clear what the final app can build.
The three kinds of slides are:
- Code Snippet: Turns code into a slide. Many languages are supported.
- CodeShot: Turns a code screenshot into a language-centric slide.
- Screenshot: Uses a screenshot (or any image) to build a slide.
The code snippet slides are based on source code, and everything is rendered via
Shape views, including the syntax highlight.
Many languages are supported, which is readable from the language syntax and background color for each snippet slide.
The CodeShot slide should take a screenshot of code (probably from your IDE) so you can tell a story of what was going on. You leverage a screenshot this time instead of a code snippet to take other screenshot elements like a Git diff or IDE special features.
Code shots are also language-centric, like code snippets, but they’re built upon code screenshots instead of source code.
Slides based on (general) screenshots should be about no-code, like IDE or tool screenshots showing special information. A common example is a Git message explaining what happened.
The slide above has underlined text (multiple words) thanks to both the JavaFX shapes and AI automation tools integrated into the app. I mention “automated” as the underlining of words in images is automatic by AI (you just have to click to underline the text).
Drawable shapes on a slide can be lines, rectangles, and circles. The green color is for good and red for errors.
The above slide was also underlined (single word) automatically by AI.
Finally, a powerful artifact out of all these features is effectively a presentation.
Presentation: Video Rendering Side Effect Fix
Domain-specific feature automation within the app slide implementations results in engineering-grade, high-quality artifacts like images and presentations to convey a technical story efficiently for both the author and consumer.
First, make sure to have Java 21+ installed on your development machine. You might need:
I highly suggest using the Zulu (FX) distribution to get the FX mods out of the box!
For creating a new JavaFX app, follow Beginning JavaFX Applications with IntelliJ IDE | foojay.io with one of the approach given by the author. I suggest using the “Plain” approach for this project.
Our app package name is
The following is the initial app project.
As given above, it’s required JavaFX
fxml mods since I decided
to use FXML with the Scene Builder. Then, the main app package has to be opened
fxml mod, so it can use reflection on our app, as well as exporting it
so JavaFX in general can see our app via reflection.
Now, we move forward the main package.
At this point, the JavaFX application is set up, so the development can take place.
Initial Master-View-Detail Layout with Drag-and-Drop ListView
I already developed the initial application layout (and logic) — which are relatively heavy —, so I’ll add the FXML version here with a preview to see what’s being developed.
This is the initial layout app.fxml file that looks like:
The layout tree (briefly) consists of:
- A split pane that defines the Master-View-Detail:
- A master view with a user input pane to manage the image files.
- A view pane that will show the slides rendered.
- A detail pane that will hold the properties for generating the slides.
It requires events for the file drag-and-drop, and some buttons.
It also has a menu bar that can be trivially implemented later.
The current layout tree consists of:
For integrating this layout, the root view has to be loaded from the FXML
app.fxml in the root of the
resources project’s directory.
AppController will handle the input events.
This gives an initial layout to the app with a controller to set some structure as far as it’s currently going. This includes a master-view-detail app layout, where the master pane will be loaded with images via drag-and-drop.
The application will handle data related to image files that make up the presentation, which will be stored in a local directory.
A basic image item needs to be loaded into the list of images.
Notice, how the
equals methods had to be overwritten because of
This item will model the images (screenshots) saved to the application data directory. Notice this will be a simple directory tree with a depth of 1 with no subdirectories.
The items need to be stored and loaded from our local storage.
For this, I defined the
I also wrote a
Data utility class to hold important functions.
That way, we’ll know whether a given file list (that can be dropped into the
ListView) has supported file extensions which will avoid polluting the data
directory with random files and ensure more correctness in our logic.
These definitions will allow us to perform data operations in our application.
Local Storage Implementation
The implementation of the
DataRepository is straightforward.
It uses the
java.nio.file API to access the file system, and the code written
This realization of
DataRepository allows us to access our
The first part of the app is a master pane that lists the images.
This pane will be able to list the images, add new image(s) via drag-and-drop
and or a
FileChooser, delete an image, delete all images, and
rearrange them in the order they will appear in the presentation.
Initializing the App Controller
AppController will need some fields.
A good initialization will be needed too.
Including other methods will be helpful as well.
Now the controller set up to keep adding the other features.
Drag and Drop
One engaging feature of this app is its file drag-and-drop, where you can create or update one or many images just as simple.
If the files are accepted by our app then they will be added.
They won’t be added if rejected (e.g., a Photopea “.psd” file). Per our rules, one invalid file is enough to reject all of them.
DragEvent cancels further actions with the clipboard files.
Another way the
DragEvent can finish is when you just cancel the drop action
with your mouse by leaving the files out.
The events are set from the
app.fxml view already, so the controller
implementation is left.
The three events that will be required for this app consist of:
- Drag Over: Files are being dragged onto the
ListView, so they will either be accepted or rejected.
- Drag Dropped: Files were deposited into the app.
- Drag Exited: Cancels the drag as files dragged with the mouse are out of scope.
These implementations will allow the drag-and-drop feature in our application.
The list in the master pane is a key for completing this implementation.
First, a custom cell renderer needs to be created, obviously.
If you remember or are homesick for the Android old school like me, that would come in handy a lot.
I also added a
Tooltip so the full file name is shown when you hover the item
and a maximum length for the text to be covered with an ellipsis if long.
Moreover, I set a rounded corner of the image in the method
imageView.setClip(clip);, and set the confirmation
Alert in the method
onDeleteButtonAction that sends the event to delete the item to the
ImageItemCell.Listener (the controller).
So, that’s the
ListView item implementation you see in the screenshots.
Then, this is integrated into the controller.
First, the cell callback that was defined can be realized by the controller,
implements ImageItemCell.Listener which is followed by:
At this point, the list with drag-and-drop was built which is a big part of this master pane.
For deleting an item, you click on the delete button of the item, and a
Alert will finish this action.
To delete all the items, the “clear”
Button will “do the trick”.
Therefore, here we go with our controller again.
This provides a safe delete mechanism for our application images.
Arranging Image Items via More Drag and Drop
Our presentation order will be the one shown in the images
already supports a drag-and-drop event for adding or updating files to the app,
and now it needs one more implementation for arranging its items via this fancy
By arranging the images loaded into the master pane via another ergonomic drag-and-drop mechanism, we can set important information for the slides such as, their order in the final presentation.
Cell Drag and Drop Implementation
First, a new event will need to be defined in the
void onArrange(int draggedIdx, int destIdx);. So, our abstract list of
images (the data structure) gets sorted when this happens (then updates the
First, notice that if you’re not careful, you’ll introduce side effects to these
kinds of events, as we can have many event implementations. In this case, the
drag events fall into the
ListView (the “bigger”) and are also listened from
each of its cells, that is, each
What controls this crazy state sharing, if you notice, are the calls to the
consume method of the
helps with this.
So, I carefully tested the app so both drag-and-drop events for files and for arranging cells work properly.
The drag view is set to a snapshot of the cell being dragged. This is the graphic you see on the tip of your mouse when dragging something on the screen.
There are some CSS classes to apply a visual effect on the cells when dragging one cell onto another. These classes have to be added to the app from a CSS file.
That was with respect to the
ImageItemCell. Now this feature has to be added
The previous changes have to be implemented in the app controller.
I lately made the app less coupled, so we’ll work with a
stored in the controller as the “source of truth” for the image data. Then, this
list will behave reactively to update the GUI.
What’s mostly new here, is the
onArrange event that was defined before in the
Listener. This method updates the data, and this data is
reactive, so it automatically updates the
ListView which was bound to the
images list via
The dragged cell is also selected when dropping it into its new position via
Cells are swapped and automatic scroll is missing which are limitations to this implementation. The wanted behaviour may probably be more advanced but that’s out of the scope of this EP.
Now both the cell and controller implementations are in sync.
App CSS Styles
It was time to introduce some CSS here to add a class to the list cell when another item is being dragged to it.
I also added a new method in
Main.java to help load these resources.
Now, the CSS classes defined above can be used to style the
Order of the Presentation Slides
With all this, we now have the arrangement of the images done as well as the master pane complete.
This menu is trivial to write now.
First, we add the events to the FXML, and the controller.
I made minor modifications, like changing the name of a
MenuItem from “New” to
“Add” for more clarity of action.
They do the following:
- Add: Opens the
FileChooserjust like pressing the existing “Add”
Buttonto add (or update) an image.
- OpenWD: Opens the app data directory (working directory) into the system file explorer so you can browse over the original application images.
- Clear: Same as the “Clear”
- Quit: Closes the app.
- About: Shows a basic
Alertwith info about the application.
So first I extracted a method to reuse it:
The Swing JavaFX module has to be added to the application modules
requires javafx.swing; as it’s required to call to
Desktop API to open the system file explorer.
Finally, the implementations are left to finish this menu.
This was the application now has a functioning menu bar.
View Detail and App Domain
The detail pane is the one to the right that allows the user to set up the information to apply to the slides, and the application domain has been defined via several types to power all these concepts.
The view pane is the middle one showing the result of rendering the slide, and this will require much more work before we start drawing something on this view.
Some refactorizations are needed to keep the project maintainable.
Then the master pane developed before has to be related to the view pane via
Once the initial application domain is defined, the detail pane can be worked out to enable the configurations that will be applied to the drawing in the view pane.
Refactorizations on Data and UI
First, some refactorizations I did at this stage.
Leading to a project module like:
Notice I moved
data as it makes more sense as I said
Removing Cyclic Dependencies, Java vs Go (2023-05-28).
Now the project has more support to accept the next changes for the view and detail panes as well as the application logic.
This component is a page indicator that will keep in sync with the list of images loaded into the master pane.
After updating the
app.fxml file, the logic is next.
The pagination has a page count that is set to the size of the images loaded
into the app via
pagination.setPageCount(images.size()), and set to
(default) if there are no items, besides setting it invisible. This is because
the minimum page count of a
Pagination can only be
1, so we have to hide it
when there are no pages.
Then, this count is updated when the list of images changes.
By default, the first list item is selected, so the app is not empty as long as there are images.
When an item of the
ListView is selected, the property is listened, so it
Pagination index via
On the other hand, when the
Pagination changes its index, the selected item in
ListView is updated via
Then, I put a
ImageView in the view pane to show the selected image in the app
as a PoC for this feature, but this view will be replaced with a
to be able to perform the drawings.
The image list and pagination will be in sync, so the app becomes more usable with this new control.
It’s time to start creating the models for this application.
Slide consists of:
CodeSnippet: A slide containing source code (from a given PL) styled as a pretty image.
CodeShot: A screenshot image of source code (e.g., screenshot of the IDE editor).
Screenshot: A random screenshot.
I also defined a basic enum to define the slides as simple iterable items.
The (programming4) language is required to style the slides, so it can be associated to a slide content.
Defining some colors will be useful for the possible drawings that can appear on the slides.
So, we can define some relations for these types. A took the language color used by GitHub and set them to the languages that were added before.
This way the app can be extended by mapping color values to the domain types.
Regarding configurations, we can add some important settings like the target size for the slides.
Since screenshots will not likely be greater than FHD, these predefined resolutions will come in handy.
In the long run, we’ll also need to compile and save the whole presentation, so here’s a basic configuration that will provide this insight.
So, it can be editable from the detail pane. A field for the size (HD, FHD), and the path to store the compilations.
All this is for now, the application domain that establishes the main logic to build the rest of GUI details.
Reusing Enums by Converting Them into English Strings
Enums can straightforwardly be used as an iterable sum type to define, among
others, the values for the
ComboBoxes. What stops us is the ability to turn
them into readable English values.
JavaFX’s utilities provide a
StringConverter abstract class to turn
Strings and vice-versa.
I’ll leave my implementation here, and let the reader figure it out.
EnglishConverter implementation will allow to use
enums more directly
as English values to the views without losing the identity between a
domain type, and a primitive
String (i.e., the
enum is set as the view value
but displayed as a
Eventually, we come across the detail pane on the right of the GUI that allows to program what information will be taken to render what’s shown in the view pane in the middle.
I added a CSS class to style the title
Labels for example.
These styles can be added as a class to the (
Label) nodes via SceneBuilder.
First, declare the new fields in the controller, and add logic for what should be done. Of course, I figured all this out before.
Here, some views have to be hidden if they’re not applicable according to the
setup of the other values. For example,
codeSnippetBox is only showed when
Slide type is set to
slideComboBox that defines what kind of
Slide is the
current selected item, so it’ll be rendered according to that model.
As you can see, we use the
EnglishConverter written before, and set the
default value of
SlideItem.CodeSnippet although I might
change those settings later.
In the detail pane, many settings can be added and adapted to define the rendering of the slides.
View Pane and Drawing
The view pane displays the result of applying the information from the detail pane on the right side to the input images from the master pane on the left side. This view shows the drawings applied to build the presentation.
The drawing logic will go to the
package of the application.
First, I defined a
SlideDrawing type to represent the editing applied to a
Then, I added a realization of this interface via a
class based on a
Group in JavaFX is normal to draw basic shapes available in the
The performance of this implementation is not great yet. Since this is an example project I won’t go into pre-optimizations a lot.
I’ve experienced a few side effects while working with drag events and imperative code for editing and composing images, and I’m not delving into more lower-level optimizations for now 😑.
Now, the first drawing implementation will build a
Slide of type
Then here we draw a
First, I take a
Group as the parent of the drawing which is a basic “layout”
to manually add the child positions. This basically resembles the
FrameLayout from Android.
We can add nodes and shapes as children to the group, and they’ll blend,
allowing us to build up our
We can add the background straightforwardly with a
Shape and the
image — which is more complex — via a good old
I added a temporal scale to the (
group) parent, so the drawing fits the
screen, as I’m not implementing zooming features any time soon.
I implemented the method
getImageCornerRadius to set relative rounded corner
values to the images so they keep the proportion. I based my measure as it was a
round icon, then played with the proportions to settle down.
Group is a lower-level layout, we have the flexibility to position the
children, like the image I put in the center of the parent, via its methods
I implemented the method
getRoundedImage to build a rounded image from an
Image and its
ImageView with a given
arc value (computed by
As you can see, the code here is much more imperative and temporal coupled.
To get the rounded image, you have to create a clip
Shape consisting of a
Rectangle resembling the dimensions of the image. Now you set the
image, and clip to the
ImageView, and build a screenshot of the
Then, remove the clip from the image view to release any side effect.
The shadow effect can be achieved by setting a
Effect to the
As I said, these steps depend on the order in which they’re called due to the imperativeness of the code.
But that’s not it 😂, more imperative code is waiting. The images have to be resized as per the configuration of the slide we put.
If the given image is too big — either in width or height — it has to be resized to fit the parent.
Now we fit the size of the
ImageView, draw the image, and set its position to
To fit the
ImageView, first, the size of the original
Image is set. If these
are out of bounds, the
ImageView is resized keeping the proportions.
Now, when we want to know the dimensions of the image, the “source of truth”
ImageView instead of the original
This is the initial implementation of the drawing package that now enables us to draw a screenshot slide into the view pane of the application.
To separate the responsibilities and avoid making the controller a bigger coupled object, other classes can take the functions of some views. This will allow to integrate the drawing package to the app GUI.
With the current development, I added a class in the
package to manage
the GUI logic for the view pane.
class can be extracted as an abstraction by using an
it’s unnecessary to over-engineer there.
I’ve defined the
record so we can save the slide state.
Persistence is another feature that will be needed, at least at the memory
ChangeListener notifies when a slide changes and when the state has to be
set for a given
ImageItem. The state of an item (slide) includes all params
shown in the detail pane, for example.
Next, the GUI properties of this object are defined so we can communicate more efficiently between objects.
Most of the time, I add a mandatory
init method to my custom views to avoid
abusing the constructor.
init, we set the properties to call
updateSlide whenever they change.
This is a faster-to-develop approach that is not optimized which is not part of
my scope now, as I said before.
As a special case, when the image changes,
onImageChange is called to perform
a state loading through the
ChangeListener since the change of an image means
our slide also changed, so we have to update the state of the new slide visible
I just added a flag
isStateLoading, to avoid rendering when the state is being
loaded, to avoid multiple updates in vain when all the properties are being set
Finally, to update the slide, we build the
Slide from its property value and
SlideDrawing implementation (
GroupSlideDrawing) we did before to
render what has to be rendered.
GroupSlideDrawing has the reference of our
Group view, so it
will draw on that node.
onSlideChange is triggered through the
ChangeListener, so the
controller will be able to persist these changes.
This view will enable us to add changes to the GUI part of the drawing without overwhelming the controller and integrating other logic from the drawing package.
Updating the Controller with the Drawing View
The controller will handle the recently created view that draws the slides.
This view has a listener, that will be realized into the controller class, that
is, we add the interface
SlideDrawingView.ChangeListener to the
signature, and implement it next.
As I explained before, we can now see the
Map that will store the in-memory
state. This maps a
ImageItem to a
When a slide changes, the event is received by
SlideDrawingView.ChangeListener, and the change is trivially stored. From this
same listener, we receive the
setState event, which loads the state associated
with that item, if any.
Then, the initialization of this view is straightforward, like the other
initializations we’ve done. Notice that we must call the
init method from our
SlideDrawingView here, as I said. Then we only have to bind the
properties and the change listener.
With all this, we now have a full minimal implementation that allows us to draw a slide of type screenshot on the view pane, and from now on, we only need to extend this framework to add many other functionalities to the app, like the others kinds of slides remaining, or more visuals as well.
Code Snippet Slide
Rendering code snippets as a pretty image with styled code for any programming language is a gorgeous challenge with powerful results that resembles us a process when developing an IDE.
I pulled the PL colors assigned on GitHub to style the background to give context to the slide. The frames have shadows and rounded corners, and the code is styled the same as my IntelliJ settings, except for specific language tokens.
The slides work for as many languages as needed.
I also added caption support to end up automating my job further.
Regarding automation, I’ll keep writing my DSLs and systems with my latest MathSwe standards.
I quickly wrote a parser with various regex and data types to give semantics to the code. Recall that it has to work for any PL for this EP. I’m not adding a specific PL syntax style to this EP.
With this, I can automate my workflow, eliminate the need for many general-purpose bloated/monolithic/manual tools like Photoshop, build an API, and do plenty of more work. Of course, among many others, for example, LaTeX 🦍 is an archaic tool I’ve always wanted to eradicate, so there we go…
Another insight I got about how to increase the productivity of these domain-specific systems is to pre-parse their input via AI.
Notice AI can be a bridge between the general-purpose input and the domain-specific one, but the underlying domain will always exist5.
Among the graphic rendering of the slide, they’re all about JavaFX
Shapes and normal GUI views. For backgrounds, a
Rectangle shape is good. For
the code chunks they’re
Text nodes in a
We can get to build an IDE with these APIs. I didn’t need to use a lower-level
Canvas. It was enough with JavaFX nodes, which shows how JavaFX is a powerful
platform for engineering applications.
I had to do various refactorizations, and specific implementations of details, of course, so I’ll just show an overview of the development from now on and skip the details.
This way, the code snippet type of slide was left implemented as a beautiful page of custom-styled code with captions which is another big step for me to get more automation of content.
Adding a New Empty Slide
Since code snippets make sense to go in a brand-new slide only, I created a “new” button to add empty slides.
Then, a new
ImageItem is created with the logo of an EP app by default. Recall
these are intended to build the slide from source code instead of images, and it
doesn’t have to be perfect as this is an example project.
This will require some minor features like saving
Images via the
DataRepository (because it only supported copying the file images from your
directory, so far).
The GUI change is also straightforward in the controller after updating the FXML view.
Notice how we make use of the freshly implemented
DataRepository in the method
createNewSlide. As said, I pass my EP icon as a
default slide image, and then it’s stored in the data directory of the app.
By adding brand-new slides to the app, we’re able to work with code snippet slides more consistently, as these don’t require image files to be generated but source code instead.
Updating the Model with the Slide Language and Caption
To update the model, I added
Language as a
record component, so one slide
can be rendered as per its language if applicable.
Caption product type is not part of the
Slide sum type, but a
record in that interface.
Now, the slides have more complete data to work on further implementations.
Code Shot Slide
This is the another kind of slide supported by the application, and it’s trivially implemented by just drawing the frame background with the color of the slide language.
This is almost the same as the screenshot slide that was done in the beginning.
This kind of slide consists of a screenshot of code (not actual code), so e.g., you can capture you’re IDE and explain what’s in the screenshot.
We get the color by
Colors.color(lang) — colors that were added
Other elements like captions can be added the same way they will be showed later.
Later, I added a class
ScreenshotDrawing to the
package to reuse
this code between
This concludes the simple implementation for this kind of slide, so we already have a minimum implementation for “screenshot”, and “code shot” slides. Finishing the “code snippet” slide is left to complete this EP.
To define the language elements, I added a package for this domain that provides the logic for definitions and parsings.
Elements to partition the possibilities we have in a set of language
semantics that can be generally found.
This implementation is general-purpose, so it works on any language, but it’s not specific to each language. That would be a ton of more work for me 🤪.
The sum type consists of
Keyword | Symbol | Type | Number | StringLiteral | Comment | Other.
They all have a
String value to hold their token.
TokenParsing type allows to return (wrap) a
Element value when parsing
The details about enums is just an implementation I made to transform \(1:1\)
interface sum types into basic
enum sum types.
So, I can add
ElementItem paired with the previous
This can be trivially done in Haskell from homogeneity, but Java like any
other mundane mixed-paradigm language, is heterogeneous, so an
enum is a
different structure than an
interface. Not to say, OOP brings the whole
jungle: interfaces are used as an all-in-one for many other affairs, too.
I defined the keywords in a utility class6.
I obviously skipped the 500 LoC for the mechanical definition of each keyword in this class. So, here you can see how to get the keywords.
I also skipped markup or DSLs like CSS, HTML since they’re quite different 😐.
Now, another utility class (missing Kotlin and functional languages so much 🤔) to define the colors 🎨 for the language elements.
Finally, I implemented a parser to get the tokens by language from the raw strings. This has many details I won’t show. I’ll just show the declarative part of it: the regex.
The underlying details are a long story I won’t mention as it’s out of scope, but it was a good exercise to rehearse regular expressions, and work out some abstract connects.
This allowed me to build the general-purpose heavy logic behind the front-end that styles the code snippets to produce beautiful slides for pretty much any source language.
Code Snippet Drawing
Finally, the front-end implementation in the drawing package will put the lang package together, so it can be added to the GUI app.
Recall that the
drawing contains the implementations to render the
slides into JavaFX nodes.
I leave the full implementation next.
I defined some polite sizes in the constructor, so it looks good.
draw will return the
Group with the slide drawn on it, similar to
what’s been done before.
To render the code nicely, the method
renderCodeSnippetFlow employs the
Parser developed previously and parses the tokens from the raw
String variable that comes with the underlying
Then, for each token, a
Node is created with the token value (it can be
a keyword, symbol, etc.). The styles are gathered from what I defined as per my
IDE custom settings. For example, the text color is set from that dictated by
utility class module via
Regarding captions, if they’re present, they’ll be added to the
Group as a
Labels and another
Rectangle for its background. This
implementation is given by the class
CaptionRenderer I added to the package
drawing as well, so it can be reused to draw captions on any kind of
All measures and bindings are set up correctly to build up the whole slide.
I added sent some methods to a utility class
This drawing provides the group node with the code snippet rendered so that we can add it to the app view pane.
This caption drawing implementation is reused by other kinds of slides as mentioned before.
It consists of the class
CaptionRenderer that was used in the
Code Snippet Drawing snippet.
By decoupling this responsibility, as shown above, I was able to draw same-style captions for the other kinds of slides too.
Putting the Drawing Together
It’s about time to complete the code snippet slide development.
By putting the
drawing packages together as was done before, we got
Group that relies on the logic defined in
lang to render the
CodeSnippetDrawing to the
drawCodeSnippet) concrete implementation of
SlideDrawing is all we need to
delegate this implementation. The
CodeSnippetDrawing decouples the classes by
taking that responsibility out of
This way, the code snippet slides are fully available in the application.
Rendering various kinds of shapes is a fit exercise to practice in this JavaFX application. They can be lines, rectangles, or circles.
This can give you the ability to annotate certain parts of slides, and even better, these annotations can be automated.
First, we must begin from the domain, in this case, the drawing definitions and implementations for shapes that will support the app. After working out the domain, we figure out any automation that can be applied to the domain language, as said at the beginning of Domain Engineering.
First, I defined the shapes I wanted to draw.
To start making this feature take shape (pun intended), let’s go to the UI updates on the controller side.
After updating the
app.fxml file with the
ComboBox to select the shape to
Button for undoing changes, the logic on the controller is next.
Then, another controller will be needed to separate the logic for the drawing on top of the slides.
This new controller will therefore work with the
Leading to the implementation of the UI inputs performed by user drawings.
As you can see, there’s a
ShapeRenderer containing the stack of
shapes drawn on the Slide. By leveraging the stack data structure via the LIFO
(Last In First Out) property provided by
Deque (impl. via
“undo” button can be trivially implemented.
When binding events, a shape is set to be rendered, and the scroll pane is no longer pannable, which disables the scroll on the slide view while drawing a shape. This continues until the process is complete, and then the scroll pane gets back to normal.
SHIFT key is detected to read when the user wants to keep the proportions
of the shape. This will be useful when implementing their rendering.
The color to paint the shapes comes from the
Palette enum, with
The rest are implementation details.
ShapeRenderer responsible for drawing a shape is coming next.
Group (coming from
SlideDrawingView, implemented via
draw) is used as a canvas to draw the shapes. Recall that
nodes are good in JavaFX for drawing shapes. So, this group is put on top of the
slide to create the composition.
The shapes are trivially drawn via JavaFX API and integrated into the corresponding slide.
Notice the flag
keepProportions is used to keep the aspect ratio of the shape
when the key
SHIFT is pressed. This is useful to draw straight lines, for
Finally, I recorded a video to demonstrate the shapes feature in the app.
The drawing of lines, rectangles, and circles is a finished feature aimed at making (off-domain) annotations on top of the (domain) slides and can be extended properly thanks to computer science concepts applied to this application module.
Integrating AI via OCR
There’s a feasible way to implement an AI application here, taking into account the slides that were composed before. AI can operate on these slides to extract further information, like text from screenshot-based slides.
The feature consists of detecting words from images (screenshots), so we have valuable information for the user to accurately select or underline words without relying on manual mouse precision.
From the image above, you can see how words on the IntelliJ screenshot are
selected and how I trivially underlined the main package
engineer.mathsoftware.blog.slides in one go.
Notice how images or screenshots are out of the app domain since they’re binary
or compiled data from the wild world, unlike code snippet slides where it’d be
relatively trivial to implement word selection since we have the source code in
TextFlow component, so it’s part of the app domain.
An OCR implementation matches a strategic use case for showing how AI can automate a system by working on external systems consisting of binary images containing information that can be extracted and transformed into our domain language.
Setting up Tesseract
Tesseract OCR is one of the main open-source projects I found, so I could detect text in Java via the Tess4J library that wraps it.
Since the project setup was simple as said in Getting Started, there’s no build tool to avoid complicating it more.
Thus, for adding the Tesseract library, you might want to add it via IntelliJ IDEA to your project if you’re not using a build system.
tess4j has to be added to the Java project source code, that is,
It’s important to follow the instructions I left in the
resources/readme.md file since you
must download and copy the
tessdata directory from its repository to run the
OCR model in the app 7. This directory contains the
file with the English data to load the model.
The tess4j library will allow us to call the Tesseract OCR API in Java and infer the text boxes on the slides.
Implementing the AI Package
Then, I created a package
ai to separate the logic for AI-only code to start
writing the new AI feature logic.
I also created a similar package with a different domain. AI is an
implementation layer on top of the app, so if we’re going to draw AI shapes
(like bounding boxes, which are an AI field rather than slide responsibilities),
then those AI shapes must be placed in a different package than
added a subpackage
ai to it.
drawing.ai will be used later on the front-end side to consume the
To implement the AI back-end or AI package, we have to get bounding boxes inferring words provided by an image. Then, we have to define the AI features that will be supported by the app, that is, OCR.
The OCR output will be read as a list of
BoundingBox from the
tessdata is loaded from the app resources directory, a new
Tesseract model is created and set up with the training data, language
(English), and other options that can work to tune the model.
Notice how a
bufferedImage is created with the
javafx.swing package to
convert the input JavaFX
Image into a (AWT)
BufferedImage that is required
tess4j API 8.
BufferedImage is passed to the
getWords method, and the result is
mapped to domain
As a side note, try not to use low-quality or small images since these will make the model have a hard time detecting anything 😆.
textBoxes of the (utility) class
Ocr will provide the word boxes
given any image, which makes our work done regarding external AI integration.
Regarding the AI application model, I defined a
SlideAI sum type to define the
The framework developed in the AI package enables the app to consume the OCR model to add rich automation features to existing slides.
As said before, AI has its own shapes to implement. We won’t only infer the abstract boxes where words are detected by Tesseract but also draw them just like other drawings done for building the slides.
The development takes place in the
drawing.ai package to decouple the drawing
Slide from the drawing of AI elements, as mentioned before.
One shape will be defined, that is, the
AIShape, which will
represent the selection the user is performing with the mouse.
AIShape is a sum type consisting of the
type only, as we don’t need other shapes for AI to work on the app.
WordSelection as the text you highlight with a marker but also has a
hover/selected state (i.e., when you pass the mouse over a word).
fill functions match the AI Shape
State to a border and
background color, respectively.
Now, there’s a peculiarity in the static construction method of
It introduces the
ai package to the
drawing.ai package by transforming a
BoundingBox (i.e., an
OcrWordDetection value from the package
WordSelection (of the package
Notice how an external type like an AWT
Rectangle from the AI output has been
transformed into domain types like JavaFX
OcrWordDetection of the
ai package, and now to
WordSelection of the
drawing.ai package. Each part of a DSL code must have its definitions to
make sense of the code or be semantic. Recall that data types save
information. This app is not a DSL, but it gets close.
Stateful value required to create a
WordSelection, it’s more
of an implementation detail to handle the states among all the words selected
Stateful interface receives two generic types for the object to hold
Focus (or attention in the GUI) and the type for the state it can have,
like hover, for instance.
Then, I added a utility class to provide a
Stateful object for the
WordSelection AI shape. We know now that it’ll be implemented as
implements Stateful<BoundingBox, AIShape.State>, since the element to focus is
BoundingBox matching a selection of word(s), and this can have
Now, the drawing or funny part is left to complete the
GroupAIDrawing implements the
AIDrawing interface defined first,
so the OCR can be represented in the app via the method
draw is called.
Notice how this design can scale since we have an
AIShape sum type being
matched in the method
draw. It’d be trivial to add more AI shapes, and if you
forget to implement them, it won’t compile.
The implementation of the AI-drawing package allows us to represent the OCR integrated into the package AI via stateful bounding boxes.
JavaFX Word Detection
Once the OCR API is ready, the new AI features have to be implemented on the JavaFX side of the app. One part of this was already done in the AI drawing package, so the GUI logic is left to finish.
For handling the user events, I created another class for the AI controller.
It has a
wordDetectionProperty for the
OcrWordDetection (product) type
defined before. Recall this type was defined to establish the AI features that
the app will support. Now, we’re implementing those features (i.e., OCR word
That property will update the bounding boxes of detected words. So, if you press
the OCR button (F1), those boxes will load, so the AI controller will reflect
them on the screen. This can be readable from the
When initializing this controller, the
Group where the
Slide is rendered is
taken as a base to infer the text. The
AIController works with its
SlideAIView, which will be added later.
The AI view can be shown or hidden since it’s a layer on top of the original slide. If AI has already been computed, there’s no reason to keep evaluating or running the model again. You might only want to hide or show the AI results (boxes), given the slide didn’t change, of course.
Notice how I used a snapshot of the slide
Group loading in
loadSlideDrawingSnapshot only once. This takes the appropriate image sizes by
reverting any scale or zoom in the
Group view (i.e., the main slide view in
the center of the app).
loadOcr, a virtual thread is used to infer the bounding boxes from the OCR
The status messages are the labels I implemented in the right-bottom of the app. I added a status message to the left bottom when starting the development. Now, I saw the opportunity to add a secondary label to the right bottom to notify about other tasks.
After the initial results on the controller side, I added an FSM to handle another state required by this problem: I needed AI invalidation to infer the OCR only once, provided the slide hadn’t changed. This solves performance issues if you press the OCR key several times without modifying the slide.
I don’t usually pass references in the constructor like
new AIInvalidation(this::loadAI); to avoid cycles, but it’s fine for now,
since I’m using plain JavaFX with MVC.
It’s readable that the methods
validate make the model up to date, whether
it’s invalid or is already valid, and
slideChanged invalidates the model, so
the next time,
validate will make sure to load AI again.
F1 key is pressed, the OCR is activated, which validates the AI
system resulting in up-to-date and optimized inference, so we can now show the
text boxes loaded into the
The implementation of OCR is about to be completed in the application.
Now it’s about time for the mouse hover detection.
I skipped further details for the sake of brevity.
Some adjustments had to be made, for example, for filtering out the slide box
(the whole slide) that is detected by OCR (because slides contain text, but we
only want to select simple text, not containers) via
filter(box -> box.getHeight() < 100.0).
Now, the mouse events are available, along with all the overwhelming OCR back-end, which leads us to the final feature for automated underlining.
The JavaFX implementations for the AI controller are mostly complete with the logic to integrate the packages AI and AI-drawing, with all the required tooling like filters and mouse events for handling bounding box states. Among the required tooling, there was also the AI invalidation FSM to fix performance issues by disabling redundant OCR invocations.
Said cross-domain implementations make the app able to detect text from images when pressing the F1 key.
Clever Word Underlining
Now that text is detected in the app via OCR, many advanced features can branch from here. The feature established for this stage was automatic word underlining.
Word underlining takes place when you click a word on the slide.
I took the existing event
onMouseClicked from the
AIController and returned
Line if there was an underline action from the user.
focus comes from
sel.wordFocus();, which is the
Stateful<BoundingBox, AIShape.State> object implemented in the package
drawing.ai. If there’s a
Focus<BoundingBox, State>, it’s mapped to a JavaFX
The event when there’s underlining can be gathered from the
SlideDrawingController, which is an appropriate abstraction to handle it.
Line that lies below a word in an image is inferred after passing
through a complex OCR and cross-application-domain process, which enables us to
draw it just like any other shape that the app can already draw.
This is the beauty of simplicity. The process is overwhelming but simple. In the end it’s just about drawing a trivial shape, but you need to know where.
As a side notice, I also made other works to the app to make it usable, like
saving slide changes (efficiently using
Map, of course), zoom (which changes
Group scaling), default values, and any other kind of work under the
hood a responsible SW engineer tackles in the everyday duty. Out of all this,
performance fixes can be vastly applied to this app, but that’s not what an EP
Words can be underlined so far, but the system can be more clever.
I defined an operation to sum lines underlined in the same row so you can keep underlining all the way left or right in the same row after an initial selection.
Now, some MSWE comes into handy to evaluate these operations. First, recall that we get the word boxes from OCR, so we can still work with them here, before transforming them into plain lines.
Once the boxes belong to the same row, they’re reduced to the clever
BoundingBox by taking polite space among all the word boxes.
Nothing can be that easy. There’s something to fix, according to me.
So, we have to remove redundant drawings of AI lines.
I created a method
getFocusLinesInRow returning a
AIController with the lines in the same row of the current word
Notice this is a temporarily coupled functionality.
Then, a method
clearAiLineRow is called in
drawing the AI underlining. This takes the current row lines, filters them, and
removes the redundant line shape.
I recorded a video to show what this feature looks like in action.
The single-word and multi-word underlining is finally implemented, which proved an appropriate AI application for this JavaFX app.
OCR Side Effects
The OCR integration has to be further engineered to make the model predictions useful. This includes removing bloat like the background from the slide to sanitize the data as much as possible so the segmented image passed to the model provides much certainty of its results.
Notice how a white background introduces a side effect to the model, which seems to infer that (white) text is part of the background. Therefore, text’s not detected.
This magnifies when the background is white, but the effect exists, disregarding the background color.
AI is not “magic,” as you have to clean your data and input so it matches the ones the model was originally designed for. This includes responsibility for both the model designers and consumers.
Needless to say, for engineering-grade models, one must leverage strongly typed (functional) languages to ensure expected correctness. That’s a minimum requirement I’d expect from engineers since it bounds the overwhelming black box of these deep learning models.
Notice how I use the term “expected correctness” referring to mathematical expectation. Black box models are stochastic, so we expect them to be correct in the long run as a pattern.
Recall that in mathematical sciences, we study the order of randomness. Even something stochastic has to converge to some expected value, that is, the mathematical pattern we ought to study.
If we know the pattern, we can turn a system into engineering-grade ✔.
For an OCR app, we expect well-defined metrics or patterns until a certain degree. I emphasize well-defined because I’ve worked with subjective ML metrics, which are generic/popular and therefore meaningless, so again, you must be proficient in your domain. AI is not magic. You must build the same good OL’ engineering and math under the hood.
It can be daunting to try to make ML models work. From my job experiences, we’ve had to use several OCR providers as fallbacks. They all suck, but some suck less for the underlying problem, so we might as well make that one the “primary provider 💸.”
One of the shoe stones you’ll find is when the image quality is low or bad, and you can’t make your users buy an iPhone 15 Pro MAX ULTRA and employ basic skills to shoot a photo 😣.
Solutions you find out there are generic, and if you try to make them work for you by fine-tuning, they will just become the same over-engineered OOP inheritance garbage based on product type classes that were popular in the Java times, the same way “AI” is hyped today.
If you know me, you know the answer: engineer for the domain as much as possible first. The simplest designs are the best. You don’t need over-bloated AI or solutions most of the time. Of course, many businesses won’t care about this, but the marketing buzzwords instead, unfortunately.
For this app, leveraging OCR is a valid approach to enhance the user workflow when it comes to screenshots, but the goal must be domain-based to optimize for source code slides instead of binary images. Remember when I said that word detection was trivial for code snippet slides? Now compare trivial to ML models that had to be trained with supercomputers of 5 years ago 💸 and worldwide data.
For this app, the major and first fix to clean the model’s input is to classify or segment the screenshot out of the background, which simplifies the OCR input.
Segmenting the image is trivial in theory since the slide is a composition of both the image and background. That is, the screenshot is part of the app domain.
However, in practice, this is harder than it looks since it requires more design. What I mean is that systems are expensive to engineer (properly). Particularly, MSWE is specialized, so you will hardly find competent SW engineers with a mathematician’s background.
Stochasticity is inherent in black-box models, and terrible side effects arise when poorly engineered. We first must eradicate the side effects by leveraging domain facts as much as possible. Therefore, stochastic behavior is consistent, making our model engineering-grade.
Automating the User Workflow via AI
AI is a set of complex and complicated emerging technologies that can be wisely leveraged to automate many works for our domain as a pattern.
Screenshot slides have text information, so whenever there’s information, we shall extract it to enrich our systems with informed actions and decisions, which was the reason I devised implementing OCR for the Slides EP.
The OCR implementation consisted of finding a respectable open-source library that can be consumed from Java to avoid cloud provider costs and keep documentation in the project via an installable dependency.
It consisted of a cross-domain design, from the selected Tesseract OCR library
ui application packages.
I’ve shown how these models are a valid technique for empowering users with automation for external systems like binary files generated out of the application while leaving clear how crucial it is to have a domain-first approach when engineering systems, so we rely on simple solutions that create elegant complexity.
Designing an Auto Save Mechanism
One mandatory feature for a modern application is to provide a safe and intelligent saving mechanism to preserve the state of the app and the user’s work.
I’ve worked on the application UI states in memory. Now, I implemented the save slide feature with an automatic system to make the app even more usable.
First, I extracted the data paths used by the app to a utility class, so this can be trivially replaced by real environment settings in a real case.
The auto-save feature takes place in the package
ui, introducing the package
AutoSave has a
DataRepository from the package
data implemented as a
local repository to save to the disk. It has a
SaveInvalidation FSM similar to
AIInvalidation that had the state when AI is loaded and up to date to
avoid redundant model invocations. This way, the save FSM will perform the
side effects only when necessary.
The auto-saving can be activated via the
enable method. It has the drawing
Group that is what we want to save. The
BackgroundStatus is the right bottom
label shown to notify updates.
When the drawing changes, it’s called
onDrawingChanged, which employs the save
FSM to invalidate the save state, and then it invokes
validateLater to run
the saving action in the background.
saveSlide reference is passed to the save FSM via
new SaveInvalidation(this::saveSlide), which applies the proper scaling to the
Group, and it runs a virtual thread to use the
to store the slide snapshot in the background. Then, it updates the UI safely,
informing the slide was saved.
SaveInvalidation implementation takes place as a static nested class.
slideChanged invalidates the save state.
validateNow runs the saving effect, then it validates the state
slideChanged and saves the current time to avoid bloating the CPU if
there are more changes continuously.
validateLater is called, it starts a virtual thread that runs
within a minimum
WAIT_TIME_MS span, and after waiting for any necessary
cool-down time, it calls
validateNow. Hence, “later” because of the
AutoSave class was integrated into the
SlideDrawingController to set the
drawing and change events.
The repository specified and implemented in the package
data, the saving
logic, and a state machine made it possible to implement a robust auto-saving
mechanism for the slides app, which adds further automation under the hood,
besides the AI implemented in the previous section.
Automation of Screenshots and Code Snippets Content
I foresaw an opportunity to develop a gorgeous example project to start my automation journey for content like screenshot stories and code snippet slides.
This is my first blog with EP, and it was extremely extensive. I blogged the development process with high granularity and left insights, as I usually do.
The implementation was feasible with JavaFX, as expected, which shows how JavaFX is a powerful platform for well-defined engineering applications. Although the app is an example project, the development was significantly complex. Also, remember that we can use Kotlin with Arrow to further enrich the platform’s robustness via functional approaches for the domain languages.
There were good exercises for putting into practice tools such as JavaFX, new Java 21+ features, and regular expressions. The exercises scaled up to the level of an advantageous AI OCR integration that left plenty of insight into domain automation.
The development of this EP was the next step to take —from theory to practice— many design concepts and standards I had devised before.
The main package
slides defined the domain of the application, while several
others were required to design the desktop app, namely, the
ai packages containing the development of
features such as drag-and-drop, item arranging, menus with shortcuts,
pagination, advanced drawing of both slide shapes and AI shapes, advanced
parsing of code snippet language tokens, captions, cross-domain OCR word
manipulation for automated underlining, and essentials like auto-saving and
under-the-hood requirements to enhance the app design and usability.
The final development results turned into a desktop app with a master-view-detail layout able to create three kinds of slides: code snippet, code shot, and screenshot. The code snippet slides are mainly encouraged since they belong to the domain by consisting of source code, while the other two consist of binary images as inputs.
The slides app is a system that allows us to optimize the building of domain-specific presentations. As a math software engineer, such achievement boosts my automation tools one step further in these graphical domains, leading to fastening future undertakings.
- JavaFX | openjfx.io.
- JavaFX Fundamentals | dev.java.
- Java Installation Guide | foojay.io.
- Beginning JavaFX Applications with IntelliJ IDE | foojay.io.
- IntelliJ IDEA.
- Scene Builder.
- Drag-and-Drop Feature in JavaFX Applications | Docs | Oracle.
- Using JavaFX UI Controls: Pagination Control | Docs | Oracle.
- A categorized list of all Java and JVM features since JDK 8 to 21 | Advanced Web Machinery.
- JavaFX - 2D Shapes | tutorialspoint.com.
- What is OCR? | Northern Essex Community College.
- Tess4J | SourceForge.
- Photopea | GitHub.
Photopea is a free web app that resembles Photoshop and has been my choice for many years ↩
In this case, two
ImageItems are equal if their names are equal ↩
Imagefield made it impossible to update the same item from a
Listwith different object instances but the same name ↩
It doesn’t have to be a GP PL, like HTML or CSS which are niche languages ↩
I mention this because many idiots believe the marketing idea that “AI is everything” (same for other marketing-hyped concepts like capitalism or OOP), while in fact, everything is rooted in complex (usually “boring”) math, domain, and engineering facts, so AI (capitalism, OOP, etc.) is just one small part of a system, it’s just one more tool ↩
I used ChatGPT to generate the keywords, and GitHub language colors, i.e., the mechanical job ↩
I didn’t track those files in Git because they’re nasty binary files and third-party on top of that, so you have to copy them manually ↩
This conversion from JavaFX to AWT is similar to when saving snapshots to the disk ↩
Make sure to implement a thread-safe implementation, see how I use the
Platform.runLatercall to send updates to the JavaFX UI thread ↩