Building Slides from Screenshots App in JavaFX
Efficient and quality engineering communication is a daily challenge for creating value such as specs, documentation, content, or visualizations, which are essential for professional and educative standards, which I started materializing in JavaFX via graphical slides for building domain-specific presentations based on screenshots, code snippets, and basic shapes, resulting in a blog with example project providing an in-depth development and documentation with advanced JavaFX and AI features for automation to conquer the set goals.
Sharing a Story from Screenshots
I wanted to build a little presentation from my daily work, so I took the screenshots to work them out in Photopea1.
The idea was to build a carousel presentation like this one, which I was editing in this PSD file. It still didn’t include captions, so besides the editing, that puts more off-topic work for me.
Needless to say, those apps like Photoshop, Google Docs, Office, etc., are general-purpose (i.e., mediocre) and manual. Plus, you even have to pay a subscription or watch ads for a monolith you will barely use much.
I need composition, I just engineer my domain-specific systems.
I lately took this experience as the final motivation to start materializing the automation of these systems without requiring the monolithic products mentioned. I’m working this materialization out via my new —and first blog with project— EP: Slides, here in this article.
Slides App
The application consists of many advanced features in a JavaFX master-view-detail desktop implementation for creating slide images and presentations based on images like screenshots, and code snippets. The types of artifacts generated are depicted here, so it’s clear what the final app can build.
The three kinds of slides are:
- Code Snippet: Turns code into a slide. Many languages are supported.
- CodeShot: Turns a code screenshot into a language-centric slide.
- Screenshot: Uses a screenshot (or any image) to build a slide.
The code snippet slides are based on source code, and everything is rendered via
JavaFX Node
and Shape
views, including the syntax highlight.
Many languages are supported, which is readable from the language syntax and background color for each snippet slide.
The CodeShot slide should take a screenshot of code (probably from your IDE) so you can tell a story of what was going on. You leverage a screenshot this time instead of a code snippet to take other screenshot elements like a Git diff or IDE special features.
Code shots are also language-centric, like code snippets, but they’re built upon code screenshots instead of source code.
Slides based on (general) screenshots should be about no-code, like IDE or tool screenshots showing special information. A common example is a Git message explaining what happened.
The slide above has underlined text (multiple words) thanks to both the JavaFX shapes and AI automation tools integrated into the app. I mention “automated” as the underlining of words in images is automatic by AI (you just have to click to underline the text).
Drawable shapes on a slide can be lines, rectangles, and circles. The green color is for good and red for errors.
The above slide was also underlined (single word) automatically by AI.
Finally, a powerful artifact out of all these features is effectively a presentation.
Presentation: Video Rendering Side Effect Fix
Domain-specific feature automation within the app slide implementations results in engineering-grade, high-quality artifacts like images and presentations to convey a technical story efficiently for both the author and consumer.
Getting Started
First, make sure to have Java 21+ installed on your development machine. You might need:
I highly suggest using the Zulu (FX) distribution to get the FX mods out of the box!
For creating a new JavaFX app, follow Beginning JavaFX Applications with IntelliJ IDE | foojay.io with one of the approach given by the author. I suggest using the “Plain” approach for this project.
Our app package name is engineer.mathsoftware.blog.slides
.
The following is the initial app project.
As given above, it’s required JavaFX controls
, and fxml
mods since I decided
to use FXML with the Scene Builder. Then, the main app package has to be opened
to the fxml
mod, so it can use reflection on our app, as well as exporting it
so JavaFX in general can see our app via reflection.
Now, we move forward the main package.
At this point, the JavaFX application is set up, so the development can take place.
Initial Master-View-Detail Layout with Drag-and-Drop ListView
I already developed the initial application layout (and logic) — which are relatively heavy —, so I’ll add the FXML version here with a preview to see what’s being developed.
This is the initial layout app.fxml file that looks like:
The layout tree (briefly) consists of:
- A split pane that defines the Master-View-Detail:
- A master view with a user input pane to manage the image files.
- A view pane that will show the slides rendered.
- A detail pane that will hold the properties for generating the slides.
It requires events for the file drag-and-drop, and some buttons.
It also has a menu bar that can be trivially implemented later.
The current layout tree consists of:
For integrating this layout, the root view has to be loaded from the FXML
resource app.fxml
in the root of the resources
project’s directory.
Then, AppController
will handle the input events.
This gives an initial layout to the app with a controller to set some structure as far as it’s currently going. This includes a master-view-detail app layout, where the master pane will be loaded with images via drag-and-drop.
Application Data
The application will handle data related to image files that make up the presentation, which will be stored in a local directory.
A basic image item needs to be loaded into the list of images.
Notice, how the hashCode
and equals
methods had to be overwritten because of
the Image
object23.
This item will model the images (screenshots) saved to the application data directory. Notice this will be a simple directory tree with a depth of 1 with no subdirectories.
The items need to be stored and loaded from our local storage.
For this, I defined the DataRepository
API.
I also wrote a Data
utility class to hold important functions.
That way, we’ll know whether a given file list (that can be dropped into the
ListView
) has supported file extensions which will avoid polluting the data
directory with random files and ensure more correctness in our logic.
These definitions will allow us to perform data operations in our application.
Local Storage Implementation
The implementation of the DataRepository
is straightforward.
It uses the java.nio.file
API to access the file system, and the code written
before.
This realization of DataRepository
allows us to access our data
directory
application images.
Master Pane
The first part of the app is a master pane that lists the images.
This pane will be able to list the images, add new image(s) via drag-and-drop
and or a Button
with FileChooser
, delete an image, delete all images, and
rearrange them in the order they will appear in the presentation.
Initializing the App Controller
The AppController
will need some fields.
A good initialization will be needed too.
Including other methods will be helpful as well.
Now the controller set up to keep adding the other features.
Drag and Drop
One engaging feature of this app is its file drag-and-drop, where you can create or update one or many images just as simple.
If the files are accepted by our app then they will be added.
They won’t be added if rejected (e.g., a Photopea “.psd” file). Per our rules, one invalid file is enough to reject all of them.
So our DragEvent
cancels further actions with the clipboard files.
Another way the DragEvent
can finish is when you just cancel the drop action
with your mouse by leaving the files out.
The events are set from the app.fxml
view already, so the controller
implementation is left.
The three events that will be required for this app consist of:
- Drag Over: Files are being dragged onto the
ListView
, so they will either be accepted or rejected. - Drag Dropped: Files were deposited into the app.
- Drag Exited: Cancels the drag as files dragged with the mouse are out of scope.
These implementations will allow the drag-and-drop feature in our application.
List View
The list in the master pane is a key for completing this implementation.
First, a custom cell renderer needs to be created, obviously.
If you remember or are homesick for the Android old school like me, that would come in handy a lot.
I also added a Tooltip
so the full file name is shown when you hover the item
and a maximum length for the text to be covered with an ellipsis if long.
Moreover, I set a rounded corner of the image in the method updateItem
via
imageView.setClip(clip);
, and set the confirmation Alert
in the method
onDeleteButtonAction
that sends the event to delete the item to the
ImageItemCell.Listener
(the controller).
So, that’s the ListView
item implementation you see in the screenshots.
Then, this is integrated into the controller.
First, the cell callback that was defined can be realized by the controller,
i.e., implements ImageItemCell.Listener
which is followed by:
At this point, the list with drag-and-drop was built which is a big part of this master pane.
Deleting Items
For deleting an item, you click on the delete button of the item, and a
confirmation Alert
will finish this action.
To delete all the items, the “clear” Button
will “do the trick”.
Therefore, here we go with our controller again.
This provides a safe delete mechanism for our application images.
Arranging Image Items via More Drag and Drop
Our presentation order will be the one shown in the images ListView
that
already supports a drag-and-drop event for adding or updating files to the app,
and now it needs one more implementation for arranging its items via this fancy
mechanism.
By arranging the images loaded into the master pane via another ergonomic drag-and-drop mechanism, we can set important information for the slides such as, their order in the final presentation.
Cell Drag and Drop Implementation
First, a new event will need to be defined in the ImageItemCell
Listener
,
namely, void onArrange(int draggedIdx, int destIdx);
. So, our abstract list of
images (the data structure) gets sorted when this happens (then updates the
view).
First, notice that if you’re not careful, you’ll introduce side effects to these
kinds of events, as we can have many event implementations. In this case, the
drag events fall into the ListView
(the “bigger”) and are also listened from
each of its cells, that is, each ImageItemCell
.
What controls this crazy state sharing, if you notice, are the calls to the
consume
method of the DragEvent
s. Setting setDropCompleted
to true
also
helps with this.
So, I carefully tested the app so both drag-and-drop events for files and for arranging cells work properly.
The drag view is set to a snapshot of the cell being dragged. This is the graphic you see on the tip of your mouse when dragging something on the screen.
There are some CSS classes to apply a visual effect on the cells when dragging one cell onto another. These classes have to be added to the app from a CSS file.
That was with respect to the ImageItemCell
. Now this feature has to be added
to the AppController
part.
Controller Update
The previous changes have to be implemented in the app controller.
I lately made the app less coupled, so we’ll work with a images
List
stored in the controller as the “source of truth” for the image data. Then, this
list will behave reactively to update the GUI.
What’s mostly new here, is the onArrange
event that was defined before in the
ImageItemCell
Listener
. This method updates the data, and this data is
reactive, so it automatically updates the ListView
which was bound to the
images
list via imageList.setItems(images)
.
The dragged cell is also selected when dropping it into its new position via
imageList.getSelectionModel().clearAndSelect(destIdx)
.
Cells are swapped and automatic scroll is missing which are limitations to this implementation. The wanted behaviour may probably be more advanced but that’s out of the scope of this EP.
Now both the cell and controller implementations are in sync.
App CSS Styles
It was time to introduce some CSS here to add a class to the list cell when another item is being dragged to it.
I also added a new method in Main.java
to help load these resources.
Now, the CSS classes defined above can be used to style the ListView
cells.
Order of the Presentation Slides
With all this, we now have the arrangement of the images done as well as the master pane complete.
Menu Bar
This menu is trivial to write now.
First, we add the events to the FXML, and the controller.
I made minor modifications, like changing the name of a MenuItem
from “New” to
“Add” for more clarity of action.
They do the following:
- Add: Opens the
FileChooser
just like pressing the existing “Add”Button
to add (or update) an image. - OpenWD: Opens the app data directory (working directory) into the system file explorer so you can browse over the original application images.
- Clear: Same as the “Clear”
Button
. - Quit: Closes the app.
- About: Shows a basic
Alert
with info about the application.
So first I extracted a method to reuse it:
The Swing JavaFX module has to be added to the application modules
(module-info.java
) via requires javafx.swing;
as it’s required to call to
the awt
Desktop
API to open the system file explorer.
Finally, the implementations are left to finish this menu.
This was the application now has a functioning menu bar.
View Detail and App Domain
The detail pane is the one to the right that allows the user to set up the information to apply to the slides, and the application domain has been defined via several types to power all these concepts.
The view pane is the middle one showing the result of rendering the slide, and this will require much more work before we start drawing something on this view.
Some refactorizations are needed to keep the project maintainable.
Then the master pane developed before has to be related to the view pane via
a Pagination
.
Once the initial application domain is defined, the detail pane can be worked out to enable the configurations that will be applied to the drawing in the view pane.
Refactorizations on Data and UI
First, some refactorizations I did at this stage.
Leading to a project module like:
Notice I moved ImageItem
from ui
to data
as it makes more sense as I said
in
Removing Cyclic Dependencies, Java vs Go (2023-05-28).
Now the project has more support to accept the next changes for the view and detail panes as well as the application logic.
Pagination
This component is a page indicator that will keep in sync with the list of images loaded into the master pane.
After updating the app.fxml
file, the logic is next.
The pagination has a page count that is set to the size of the images loaded
into the app via pagination.setPageCount(images.size())
, and set to 1
(default) if there are no items, besides setting it invisible. This is because
the minimum page count of a Pagination
can only be 1
, so we have to hide it
when there are no pages.
Then, this count is updated when the list of images changes.
By default, the first list item is selected, so the app is not empty as long as there are images.
When an item of the ListView
is selected, the property is listened, so it
updates the Pagination
index via pagination.setCurrentPageIndex(newValue)
.
On the other hand, when the Pagination
changes its index, the selected item in
the ListView
is updated via imageList.getSelectionModel().select
(newValue)
.
Then, I put a ImageView
in the view pane to show the selected image in the app
as a PoC for this feature, but this view will be replaced with a Group
later,
to be able to perform the drawings.
The image list and pagination will be in sync, so the app becomes more usable with this new control.
Slides
It’s time to start creating the models for this application.
A Slide
consists of:
CodeSnippet
: A slide containing source code (from a given PL) styled as a pretty image.CodeShot
: A screenshot image of source code (e.g., screenshot of the IDE editor).Screenshot
: A random screenshot.
I also defined a basic enum to define the slides as simple iterable items.
The (programming4) language is required to style the slides, so it can be associated to a slide content.
Defining some colors will be useful for the possible drawings that can appear on the slides.
So, we can define some relations for these types. A took the language color used by GitHub and set them to the languages that were added before.
This way the app can be extended by mapping color values to the domain types.
Regarding configurations, we can add some important settings like the target size for the slides.
Since screenshots will not likely be greater than FHD, these predefined resolutions will come in handy.
In the long run, we’ll also need to compile and save the whole presentation, so here’s a basic configuration that will provide this insight.
So, it can be editable from the detail pane. A field for the size (HD, FHD), and the path to store the compilations.
All this is for now, the application domain that establishes the main logic to build the rest of GUI details.
Reusing Enums by Converting Them into English Strings
Enums can straightforwardly be used as an iterable sum type to define, among
others, the values for the ComboBox
es. What stops us is the ability to turn
them into readable English values.
JavaFX’s utilities provide a StringConverter
abstract class to turn Object
s
into String
s and vice-versa.
I’ll leave my implementation here, and let the reader figure it out.
This EnglishConverter
implementation will allow to use enum
s more directly
as English values to the views without losing the identity between a enum
domain type, and a primitive String
(i.e., the enum
is set as the view value
but displayed as a String
).
Detail Pane
Eventually, we come across the detail pane on the right of the GUI that allows to program what information will be taken to render what’s shown in the view pane in the middle.
I added a CSS class to style the title Label
s for example.
These styles can be added as a class to the (Label
) nodes via SceneBuilder.
First, declare the new fields in the controller, and add logic for what should be done. Of course, I figured all this out before.
Here, some views have to be hidden if they’re not applicable according to the
setup of the other values. For example, codeSnippetBox
is only showed when
the Slide
type is set to CodeSnippet
.
The main ComboBox
is slideComboBox
that defines what kind of Slide
is the
current selected item, so it’ll be rendered according to that model.
As you can see, we use the EnglishConverter
written before, and set the
default value of slideComboBox
to SlideItem.CodeSnippet
although I might
change those settings later.
In the detail pane, many settings can be added and adapted to define the rendering of the slides.
View Pane and Drawing
The view pane displays the result of applying the information from the detail pane on the right side to the input images from the master pane on the left side. This view shows the drawings applied to build the presentation.
Drawing Package
The drawing logic will go to the drawing
package
of the application.
First, I defined a SlideDrawing
type to represent the editing applied to a
Slide
.
Then, I added a realization of this interface via a class
based on a Group
node.
Using a Group
in JavaFX is normal to draw basic shapes available in the
platform.
The performance of this implementation is not great yet. Since this is an example project I won’t go into pre-optimizations a lot.
I’ve experienced a few side effects while working with drag events and imperative code for editing and composing images, and I’m not delving into more lower-level optimizations for now 😑.
Now, the first drawing implementation will build a Slide
of type
Screenshot
.
Then here we draw a Screenshot
Slide
.
First, I take a Group
as the parent of the drawing which is a basic “layout”
to manually add the child positions. This basically resembles the
FrameLayout
from Android.
We can add nodes and shapes as children to the group, and they’ll blend,
allowing us to build up our Scene
.
We can add the background straightforwardly with a Rectangle
Shape
and the
image — which is more complex — via a good old ImageView
Node
.
I added a temporal scale to the (group
) parent, so the drawing fits the
screen, as I’m not implementing zooming features any time soon.
I implemented the method getImageCornerRadius
to set relative rounded corner
values to the images so they keep the proportion. I based my measure as it was a
round icon, then played with the proportions to settle down.
Since Group
is a lower-level layout, we have the flexibility to position the
children, like the image I put in the center of the parent, via its methods
setX
and setY
.
I implemented the method getRoundedImage
to build a rounded image from an
Image
and its ImageView
with a given arc
value (computed by
getImageCornerRadius
).
As you can see, the code here is much more imperative and temporal coupled.
To get the rounded image, you have to create a clip Shape
consisting of a
rounded Rectangle
resembling the dimensions of the image. Now you set the
image, and clip to the ImageView
, and build a screenshot of the ImageView
.
Then, remove the clip from the image view to release any side effect.
The shadow effect can be achieved by setting a DropShadow
Effect
to the
ImageView
.
As I said, these steps depend on the order in which they’re called due to the imperativeness of the code.
But that’s not it 😂, more imperative code is waiting. The images have to be resized as per the configuration of the slide we put.
If the given image is too big — either in width or height — it has to be resized to fit the parent.
Now we fit the size of the ImageView
, draw the image, and set its position to
center it.
To fit the ImageView
, first, the size of the original Image
is set. If these
are out of bounds, the ImageView
is resized keeping the proportions.
Now, when we want to know the dimensions of the image, the “source of truth”
becomes the ImageView
instead of the original Image
.
This is the initial implementation of the drawing package that now enables us to draw a screenshot slide into the view pane of the application.
Drawing View
To separate the responsibilities and avoid making the controller a bigger coupled object, other classes can take the functions of some views. This will allow to integrate the drawing package to the app GUI.
With the current development, I added a class in the ui
package
to manage
the GUI logic for the view pane.
This class
can be extracted as an abstraction by using an interface
, but
it’s unnecessary to over-engineer there.
I’ve defined the SlideState
record
so we can save the slide state.
Persistence is another feature that will be needed, at least at the memory
level.
The ChangeListener
notifies when a slide changes and when the state has to be
set for a given ImageItem
. The state of an item (slide) includes all params
shown in the detail pane, for example.
Next, the GUI properties of this object are defined so we can communicate more efficiently between objects.
Most of the time, I add a mandatory init
method to my custom views to avoid
abusing the constructor.
In init
, we set the properties to call updateSlide
whenever they change.
This is a faster-to-develop approach that is not optimized which is not part of
my scope now, as I said before.
As a special case, when the image changes, onImageChange
is called to perform
a state loading through the ChangeListener
since the change of an image means
our slide also changed, so we have to update the state of the new slide visible
on screen.
I just added a flag isStateLoading
, to avoid rendering when the state is being
loaded, to avoid multiple updates in vain when all the properties are being set
at once.
Finally, to update the slide, we build the Slide
from its property value and
use the SlideDrawing
implementation (GroupSlideDrawing
) we did before to
render what has to be rendered.
Recall that GroupSlideDrawing
has the reference of our Group
view, so it
will draw on that node.
Then, onSlideChange
is triggered through the ChangeListener
, so the
controller will be able to persist these changes.
This view will enable us to add changes to the GUI part of the drawing without overwhelming the controller and integrating other logic from the drawing package.
Updating the Controller with the Drawing View
The controller will handle the recently created view that draws the slides.
This view has a listener, that will be realized into the controller class, that
is, we add the interface SlideDrawingView.ChangeListener
to the class
signature, and implement it next.
As I explained before, we can now see the Map
that will store the in-memory
state. This maps a ImageItem
to a SlideState
.
When a slide changes, the event is received by onSlideChange
from
SlideDrawingView.ChangeListener
, and the change is trivially stored. From this
same listener, we receive the setState
event, which loads the state associated
with that item, if any.
Then, the initialization of this view is straightforward, like the other
initializations we’ve done. Notice that we must call the init
method from our
custom SlideDrawingView
here, as I said. Then we only have to bind the
properties and the change listener.
With all this, we now have a full minimal implementation that allows us to draw a slide of type screenshot on the view pane, and from now on, we only need to extend this framework to add many other functionalities to the app, like the others kinds of slides remaining, or more visuals as well.
Code Snippet Slide
Rendering code snippets as a pretty image with styled code for any programming language is a gorgeous challenge with powerful results that resembles us a process when developing an IDE.
I pulled the PL colors assigned on GitHub to style the background to give context to the slide. The frames have shadows and rounded corners, and the code is styled the same as my IntelliJ settings, except for specific language tokens.
The slides work for as many languages as needed.
I also added caption support to end up automating my job further.
Regarding automation, I’ll keep writing my DSLs and systems with my latest MathSwe standards.
I quickly wrote a parser with various regex and data types to give semantics to the code. Recall that it has to work for any PL for this EP. I’m not adding a specific PL syntax style to this EP.
With this, I can automate my workflow, eliminate the need for many general-purpose bloated/monolithic/manual tools like Photoshop, build an API, and do plenty of more work. Of course, among many others, for example, LaTeX 🦍 is an archaic tool I’ve always wanted to eradicate, so there we go…
Another insight I got about how to increase the productivity of these domain-specific systems is to pre-parse their input via AI.
Notice AI can be a bridge between the general-purpose input and the domain-specific one, but the underlying domain will always exist.
Among the graphic rendering of the slide, they’re all about JavaFX Node
s like
Shape
s and normal GUI views. For backgrounds, a Rectangle
shape is good. For
the code chunks they’re Text
nodes in a TextFlow
parent.
We can get to build an IDE with these APIs. I didn’t need to use a lower-level
Canvas
. It was enough with JavaFX nodes, which shows how JavaFX is a powerful
platform for engineering applications.
I had to do various refactorizations, and specific implementations of details, of course, so I’ll just show an overview of the development from now on and skip the details.
This way, the code snippet type of slide was left implemented as a beautiful page of custom-styled code with captions which is another big step for me to get more automation of content.
Adding a New Empty Slide
Since code snippets make sense to go in a brand-new slide only, I created a “new” button to add empty slides.
Then, a new ImageItem
is created with the logo of an EP app by default. Recall
these are intended to build the slide from source code instead of images, and it
doesn’t have to be perfect as this is an example project.
This will require some minor features like saving Image
s via the
DataRepository
(because it only supported copying the file images from your
directory, so far).
The GUI change is also straightforward in the controller after updating the FXML view.
Notice how we make use of the freshly implemented createOrUpdateImage
of
DataRepository
in the method createNewSlide
. As said, I pass my EP icon as a
default slide image, and then it’s stored in the data directory of the app.
By adding brand-new slides to the app, we’re able to work with code snippet slides more consistently, as these don’t require image files to be generated but source code instead.
Updating the Model with the Slide Language and Caption
To update the model, I added Language
as a record
component, so one slide
can be rendered as per its language if applicable.
Notice the Caption
product type is not part of the Slide
sum type, but a
standalone record
in that interface.
Now, the slides have more complete data to work on further implementations.
Code Shot Slide
This is the another kind of slide supported by the application, and it’s trivially implemented by just drawing the frame background with the color of the slide language.
This is almost the same as the screenshot slide that was done in the beginning.
This kind of slide consists of a screenshot of code (not actual code), so e.g., you can capture you’re IDE and explain what’s in the screenshot.
We get the color by Language
via Colors.color(lang)
— colors that were added
above.
Other elements like captions can be added the same way they will be showed later.
Later, I added a class ScreenshotDrawing
to the drawing
package
to reuse
this code between Screenshot
and CodeShot
Slide
implementations.
This concludes the simple implementation for this kind of slide, so we already have a minimum implementation for “screenshot”, and “code shot” slides. Finishing the “code snippet” slide is left to complete this EP.
Package Lang
To define the language elements, I added a package for this domain that provides the logic for definitions and parsings.
I defined Element
s to partition the possibilities we have in a set of language
semantics that can be generally found.
This implementation is general-purpose, so it works on any language, but it’s not specific to each language. That would be a ton of more work for me 🤪.
The sum type consists of Keyword | Symbol | Type | Number | StringLiteral | Comment | Other
.
They all have a String
value to hold their token.
The TokenParsing
type allows to return (wrap) a Element
value when parsing
the raw String
.
The details about enums is just an implementation I made to transform \(1:1\)
interface
sum types into basic enum
sum types.
So, I can add ElementItem
paired with the previous Element
interface
type.
This can be trivially done in Haskell from homogeneity, but Java like any
other mundane mixed-paradigm language, is heterogeneous, so an enum
is a
different structure than an interface
. Not to say, OOP brings the whole
jungle: interfaces are used as an all-in-one for many other affairs, too.
I defined the keywords in a utility class5.
I obviously skipped the 500 LoC for the mechanical definition of each keyword in this class. So, here you can see how to get the keywords.
I also skipped markup or DSLs like CSS, HTML since they’re quite different 😐.
Now, another utility class (missing Kotlin and functional languages so much 🤔) to define the colors 🎨 for the language elements.
Finally, I implemented a parser to get the tokens by language from the raw strings. This has many details I won’t show. I’ll just show the declarative part of it: the regex.
The underlying details are a long story I won’t mention as it’s out of scope, but it was a good exercise to rehearse regular expressions, and work out some abstract connects.
This allowed me to build the general-purpose heavy logic behind the front-end that styles the code snippets to produce beautiful slides for pretty much any source language.
Code Snippet Drawing
Finally, the front-end implementation in the drawing package will put the lang package together, so it can be added to the GUI app.
Recall that the package
drawing
contains the implementations to render the
slides into JavaFX nodes.
I leave the full implementation next.
I defined some polite sizes in the constructor, so it looks good.
The method draw
will return the Group
with the slide drawn on it, similar to
what’s been done before.
To render the code nicely, the method renderCodeSnippetFlow
employs the
Parser
developed previously and parses the tokens from the raw code
String
variable that comes with the underlying Slide
.
Then, for each token, a Text
Node
is created with the token value (it can be
a keyword, symbol, etc.). The styles are gathered from what I defined as per my
IDE custom settings. For example, the text color is set from that dictated by
the SchemeColors
utility class module via
text.setFill(SchemeColors.color(el))
.
Regarding captions, if they’re present, they’ll be added to the Group
as a
normal VBox
with Label
s and another Rectangle
for its background. This
implementation is given by the class CaptionRenderer
I added to the package
drawing
as well, so it can be reused to draw captions on any kind of
slide.
All measures and bindings are set up correctly to build up the whole slide.
I added sent some methods to a utility class Drawings
, like clearRect
,
getImageCornerRadius
, and getRoundedImage
.
This drawing provides the group node with the code snippet rendered so that we can add it to the app view pane.
Rendering Captions
This caption drawing implementation is reused by other kinds of slides as mentioned before.
It consists of the class CaptionRenderer
that was used in the
Code Snippet Drawing snippet.
By decoupling this responsibility, as shown above, I was able to draw same-style captions for the other kinds of slides too.
Putting the Drawing Together
It’s about time to complete the code snippet slide development.
By putting the lang
and drawing
packages together as was done before, we got
the drawing Group
that relies on the logic defined in lang
to render the
composition.
Adding the CodeSnippetDrawing
to the GroupSlideDrawing
(method
drawCodeSnippet
) concrete implementation of SlideDrawing
is all we need to
delegate this implementation. The CodeSnippetDrawing
decouples the classes by
taking that responsibility out of GroupSlideDrawing
.
This way, the code snippet slides are fully available in the application.
Drawing Shapes
Rendering various kinds of shapes is a fit exercise to practice in this JavaFX application. They can be lines, rectangles, or circles.
This can give you the ability to annotate certain parts of slides, and even better, these annotations can be automated.
First, we must begin from the domain, in this case, the drawing definitions and implementations for shapes that will support the app. After working out the domain, we figure out any automation that can be applied to the domain language, as said at the beginning of Domain Engineering.
First, I defined the shapes I wanted to draw.
To start making this feature take shape (pun intended), let’s go to the UI updates on the controller side.
After updating the app.fxml
file with the ComboBox
to select the shape to
draw and Button
for undoing changes, the logic on the controller is next.
Then, another controller will be needed to separate the logic for the drawing on top of the slides.
This new controller will therefore work with the SlideDrawingView
.
Leading to the implementation of the UI inputs performed by user drawings.
As you can see, there’s a Deque
of ShapeRenderer
containing the stack of
shapes drawn on the Slide. By leveraging the stack data structure via the LIFO
(Last In First Out) property provided by Deque
(impl. via LinkedList
), the
“undo” button can be trivially implemented.
When binding events, a shape is set to be rendered, and the scroll pane is no longer pannable, which disables the scroll on the slide view while drawing a shape. This continues until the process is complete, and then the scroll pane gets back to normal.
The SHIFT
key is detected to read when the user wants to keep the proportions
of the shape. This will be useful when implementing their rendering.
The color to paint the shapes comes from the Palette
enum, with Good
and
Error
values.
The rest are implementation details.
Finally, the ShapeRenderer
responsible for drawing a shape is coming next.
A Group
(coming from SlideDrawingView
, implemented via GroupSlideDrawing
at method draw
) is used as a canvas to draw the shapes. Recall that Group
nodes are good in JavaFX for drawing shapes. So, this group is put on top of the
slide to create the composition.
The shapes are trivially drawn via JavaFX API and integrated into the corresponding slide.
Notice the flag keepProportions
is used to keep the aspect ratio of the shape
when the key SHIFT
is pressed. This is useful to draw straight lines, for
instance.
Finally, I recorded a video to demonstrate the shapes feature in the app.
The drawing of lines, rectangles, and circles is a finished feature aimed at making (off-domain) annotations on top of the (domain) slides and can be extended properly thanks to computer science concepts applied to this application module.
Integrating AI via OCR
There’s a feasible way to implement an AI application here, taking into account the slides that were composed before. AI can operate on these slides to extract further information, like text from screenshot-based slides.
The feature consists of detecting words from images (screenshots), so we have valuable information for the user to accurately select or underline words without relying on manual mouse precision.
From the image above, you can see how words on the IntelliJ screenshot are
selected and how I trivially underlined the main package
engineer.mathsoftware.blog.slides
in one go.
Notice how images or screenshots are out of the app domain since they’re binary
or compiled data from the wild world, unlike code snippet slides where it’d be
relatively trivial to implement word selection since we have the source code in
the JavaFX TextFlow
component, so it’s part of the app domain.
An OCR implementation matches a strategic use case for showing how AI can automate a system by working on external systems consisting of binary images containing information that can be extracted and transformed into our domain language.
Setting up Tesseract
Tesseract OCR is one of the main open-source projects I found, so I could detect text in Java via the Tess4J library that wraps it.
Since the project setup was simple as said in Getting Started, there’s no build tool to avoid complicating it more.
Thus, for adding the Tesseract library, you might want to add it via IntelliJ IDEA to your project if you’re not using a build system.
The module tess4j
has to be added to the Java project source code, that is,
update the module-info
file.
It’s important to follow the instructions I left in the
resources/readme.md file since you
must download and copy the tessdata
directory from its repository to run the
OCR model in the app 6. This directory contains the eng.traineddata
file with the English data to load the model.
The tess4j library will allow us to call the Tesseract OCR API in Java and infer the text boxes on the slides.
Implementing the AI Package
Then, I created a package ai
to separate the logic for AI-only code to start
writing the new AI feature logic.
I also created a similar package with a different domain. AI is an
implementation layer on top of the app, so if we’re going to draw AI shapes
(like bounding boxes, which are an AI field rather than slide responsibilities),
then those AI shapes must be placed in a different package than drawing
. I
added a subpackage ai
to it.
The package drawing.ai
will be used later on the front-end side to consume the
AI logic.
To implement the AI back-end or AI package, we have to get bounding boxes inferring words provided by an image. Then, we have to define the AI features that will be supported by the app, that is, OCR.
The OCR output will be read as a list of BoundingBox
from the
javafx.geometry
package.
Then, the tessdata
is loaded from the app resources directory, a new
Tesseract
model is created and set up with the training data, language
(English), and other options that can work to tune the model.
Notice how a bufferedImage
is created with the javafx.swing
package to
convert the input JavaFX Image
into a (AWT) BufferedImage
that is required
by the tess4j
API 7.
This BufferedImage
is passed to the getWords
method, and the result is
mapped to domain BoundingBox
objects.
As a side note, try not to use low-quality or small images since these will make the model have a hard time detecting anything 😆.
The method textBoxes
of the (utility) class Ocr
will provide the word boxes
given any image, which makes our work done regarding external AI integration.
Regarding the AI application model, I defined a SlideAI
sum type to define the
AI features.
The framework developed in the AI package enables the app to consume the OCR model to add rich automation features to existing slides.
AI Drawing
As said before, AI has its own shapes to implement. We won’t only infer the abstract boxes where words are detected by Tesseract but also draw them just like other drawings done for building the slides.
The development takes place in the drawing.ai
package to decouple the drawing
of Slide
from the drawing of AI elements, as mentioned before.
One shape will be defined, that is, the WordSelection
AIShape
, which will
represent the selection the user is performing with the mouse.
Notice how AIShape
is a sum type consisting of the WordSelection
product
type only, as we don’t need other shapes for AI to work on the app.
Imagine a WordSelection
as the text you highlight with a marker but also has a
hover/selected state (i.e., when you pass the mouse over a word).
The color
and fill
functions match the AI Shape State
to a border and
background color, respectively.
Now, there’s a peculiarity in the static construction method of WordSelection
.
It introduces the ai
package to the drawing.ai
package by transforming a
list of BoundingBox
(i.e., an OcrWordDetection
value from the package ai
)
into a WordSelection
(of the package drawing.ai
).
Notice how an external type like an AWT Rectangle
from the AI output has been
transformed into domain types like JavaFX BoundingBox
, then
OcrWordDetection
of the ai
package, and now to WordSelection
of the
drawing.ai
package. Each part of a DSL code must have its definitions to
make sense of the code or be semantic. Recall that data types save
information. This app is not a DSL, but it gets close.
Regarding the Stateful
value required to create a WordSelection
, it’s more
of an implementation detail to handle the states among all the words selected
around.
The Stateful
interface receives two generic types for the object to hold
the Focus
(or attention in the GUI) and the type for the state it can have,
like hover, for instance.
Then, I added a utility class to provide a Stateful
object for the
WordSelection
AI shape. We know now that it’ll be implemented as
implements Stateful<BoundingBox, AIShape.State>
, since the element to focus is
a BoundingBox
matching a selection of word(s), and this can have
an AIShape.State
like Normal
, Hover
, or Selected
.
Now, the drawing or funny part is left to complete the drawing.ai
package.
The class GroupAIDrawing
implements the AIDrawing
interface defined first,
so the OCR can be represented in the app via the method drawWordSelection
after draw
is called.
Notice how this design can scale since we have an AIShape
sum type being
matched in the method draw
. It’d be trivial to add more AI shapes, and if you
forget to implement them, it won’t compile.
The implementation of the AI-drawing package allows us to represent the OCR integrated into the package AI via stateful bounding boxes.
JavaFX Word Detection
Once the OCR API is ready, the new AI features have to be implemented on the JavaFX side of the app. One part of this was already done in the AI drawing package, so the GUI logic is left to finish.
For handling the user events, I created another class for the AI controller.
It has a wordDetectionProperty
for the OcrWordDetection
(product) type
defined before. Recall this type was defined to establish the AI features that
the app will support. Now, we’re implementing those features (i.e., OCR word
detection).
That property will update the bounding boxes of detected words. So, if you press
the OCR button (F1), those boxes will load, so the AI controller will reflect
them on the screen. This can be readable from the loadOcr
and
clearOcr
methods.
When initializing this controller, the Group
where the Slide
is rendered is
taken as a base to infer the text. The AIController
works with its
SlideAIView
, which will be added later.
The AI view can be shown or hidden since it’s a layer on top of the original slide. If AI has already been computed, there’s no reason to keep evaluating or running the model again. You might only want to hide or show the AI results (boxes), given the slide didn’t change, of course.
Notice how I used a snapshot of the slide Group
loading in
loadSlideDrawingSnapshot
only once. This takes the appropriate image sizes by
reverting any scale or zoom in the Group
view (i.e., the main slide view in
the center of the app).
In loadOcr
, a virtual thread is used to infer the bounding boxes from the OCR
model 8.
The status messages are the labels I implemented in the right-bottom of the app. I added a status message to the left bottom when starting the development. Now, I saw the opportunity to add a secondary label to the right bottom to notify about other tasks.
After the initial results on the controller side, I added an FSM to handle another state required by this problem: I needed AI invalidation to infer the OCR only once, provided the slide hadn’t changed. This solves performance issues if you press the OCR key several times without modifying the slide.
I don’t usually pass references in the constructor like
new AIInvalidation(this::loadAI);
to avoid cycles, but it’s fine for now,
since I’m using plain JavaFX with MVC.
It’s readable that the methods validate
make the model up to date, whether
it’s invalid or is already valid, and slideChanged
invalidates the model, so
the next time, validate
will make sure to load AI again.
When the F1
key is pressed, the OCR is activated, which validates the AI
system resulting in up-to-date and optimized inference, so we can now show the
text boxes loaded into the AIController
.
The implementation of OCR is about to be completed in the application.
Now it’s about time for the mouse hover detection.
I skipped further details for the sake of brevity.
Some adjustments had to be made, for example, for filtering out the slide box
(the whole slide) that is detected by OCR (because slides contain text, but we
only want to select simple text, not containers) via
filter(box -> box.getHeight() < 100.0)
.
Now, the mouse events are available, along with all the overwhelming OCR back-end, which leads us to the final feature for automated underlining.
The JavaFX implementations for the AI controller are mostly complete with the logic to integrate the packages AI and AI-drawing, with all the required tooling like filters and mouse events for handling bounding box states. Among the required tooling, there was also the AI invalidation FSM to fix performance issues by disabling redundant OCR invocations.
Said cross-domain implementations make the app able to detect text from images when pressing the F1 key.
Clever Word Underlining
Now that text is detected in the app via OCR, many advanced features can branch from here. The feature established for this stage was automatic word underlining.
Word underlining takes place when you click a word on the slide.
I took the existing event onMouseClicked
from the AIController
and returned
an optional Line
if there was an underline action from the user.
The focus
comes from sel.wordFocus();
, which is the
Stateful<BoundingBox, AIShape.State>
object implemented in the package
drawing.ai
. If there’s a Focus<BoundingBox, State>
, it’s mapped to a JavaFX
Line
.
The event when there’s underlining can be gathered from the
SlideDrawingController
, which is an appropriate abstraction to handle it.
Finally, the Line
that lies below a word in an image is inferred after passing
through a complex OCR and cross-application-domain process, which enables us to
draw it just like any other shape that the app can already draw.
This is the beauty of simplicity. The process is overwhelming but simple. In the end it’s just about drawing a trivial shape, but you need to know where.
As a side notice, I also made other works to the app to make it usable, like
saving slide changes (efficiently using Map
, of course), zoom (which changes
the slide Group
scaling), default values, and any other kind of work under the
hood a responsible SW engineer tackles in the everyday duty. Out of all this,
performance fixes can be vastly applied to this app, but that’s not what an EP
consists of.
Words can be underlined so far, but the system can be more clever.
I defined an operation to sum lines underlined in the same row so you can keep underlining all the way left or right in the same row after an initial selection.
Now, some MSWE comes into handy to evaluate these operations. First, recall that we get the word boxes from OCR, so we can still work with them here, before transforming them into plain lines.
Once the boxes belong to the same row, they’re reduced to the clever
BoundingBox
by taking polite space among all the word boxes.
Nothing can be that easy. There’s something to fix, according to me.
So, we have to remove redundant drawings of AI lines.
I created a method getFocusLinesInRow
returning a List<Line>
in
the AIController
with the lines in the same row of the current word Focus
.
Notice this is a temporarily coupled functionality.
Then, a method clearAiLineRow
is called in SlideDrawingController
before
drawing the AI underlining. This takes the current row lines, filters them, and
removes the redundant line shape.
I recorded a video to show what this feature looks like in action.
The single-word and multi-word underlining is finally implemented, which proved an appropriate AI application for this JavaFX app.
OCR Side Effects
The OCR integration has to be further engineered to make the model predictions useful. This includes removing bloat like the background from the slide to sanitize the data as much as possible so the segmented image passed to the model provides much certainty of its results.
Notice how a white background introduces a side effect to the model, which seems to infer that (white) text is part of the background. Therefore, text’s not detected.
This magnifies when the background is white, but the effect exists, disregarding the background color.
AI is not “magic,” as you have to clean your data and input so it matches the ones the model was originally designed for. This includes responsibility for both the model designers and consumers.
Needless to say, for engineering-grade models, one must leverage strongly typed (functional) languages to ensure expected correctness. That’s a minimum requirement I’d expect from engineers since it bounds the overwhelming black box of these deep learning models.
Notice how I use the term “expected correctness” referring to mathematical expectation. Black box models are stochastic, so we expect them to be correct in the long run as a pattern.
Recall that in mathematical sciences, we study the order of randomness. Even something stochastic has to converge to some expected value, that is, the mathematical pattern we ought to study.
If we know the pattern, we can turn a system into engineering-grade ✔.
For an OCR app, we expect well-defined metrics or patterns until a certain degree. I emphasize well-defined because I’ve worked with subjective ML metrics, which are generic/popular and therefore meaningless, so again, you must be proficient in your domain. AI is not magic. You must build the same good OL’ engineering and math under the hood.
If you know me, you know the answer: engineer for the domain as much as possible first. The simplest designs are the best. You don’t need over-bloated AI or solutions most of the time. Of course, many businesses won’t care about this, but the marketing buzzwords instead, unfortunately.
For this app, leveraging OCR is a valid approach to enhance the user workflow when it comes to screenshots, but the goal must be domain-based to optimize for source code slides instead of binary images. Remember when I said that word detection was trivial for code snippet slides? Now compare trivial to ML models that had to be trained with supercomputers of 5 years ago 💸 and worldwide data.
For this app, the major and first fix to clean the model’s input is to classify or segment the screenshot out of the background, which simplifies the OCR input.
Segmenting the image is trivial in theory since the slide is a composition of both the image and background. That is, the screenshot is part of the app domain.
However, in practice, this is harder than it looks since it requires more design. What I mean is that systems are expensive to engineer (properly). Particularly, MSWE is specialized, so you will hardly find competent SW engineers with a mathematician’s background.
Stochasticity is inherent in black-box models, and terrible side effects arise when poorly engineered. We first must eradicate the side effects by leveraging domain facts as much as possible. Therefore, stochastic behavior is consistent, making our model engineering-grade.
Automating the User Workflow via AI
AI is a set of complex and complicated emerging technologies that can be wisely leveraged to automate many works for our domain as a pattern.
Screenshot slides have text information, so whenever there’s information, we shall extract it to enrich our systems with informed actions and decisions, which was the reason I devised implementing OCR for the Slides EP.
The OCR implementation consisted of finding a respectable open-source library that can be consumed from Java to avoid cloud provider costs and keep documentation in the project via an installable dependency.
It consisted of a cross-domain design, from the selected Tesseract OCR library
to the ai
, drawing.ai
, and ui
application packages.
I’ve shown how these models are a valid technique for empowering users with automation for external systems like binary files generated out of the application while leaving clear how crucial it is to have a domain-first approach when engineering systems, so we rely on simple solutions that create elegant complexity.
Designing an Auto Save Mechanism
One mandatory feature for a modern application is to provide a safe and intelligent saving mechanism to preserve the state of the app and the user’s work.
I’ve worked on the application UI states in memory. Now, I implemented the save slide feature with an automatic system to make the app even more usable.
First, I extracted the data paths used by the app to a utility class, so this can be trivially replaced by real environment settings in a real case.
The auto-save feature takes place in the package ui
, introducing the package
data
again.
The AutoSave
has a DataRepository
from the package data
implemented as a
local repository to save to the disk. It has a SaveInvalidation
FSM similar to
the AIInvalidation
that had the state when AI is loaded and up to date to
avoid redundant model invocations. This way, the save FSM will perform the
side effects only when necessary.
The auto-saving can be activated via the enable
method. It has the drawing
Group
that is what we want to save. The BackgroundStatus
is the right bottom
label shown to notify updates.
When the drawing changes, it’s called onDrawingChanged
, which employs the save
FSM to invalidate the save state, and then it invokes validateLater
to run
the saving action in the background.
The method saveSlide
reference is passed to the save FSM via
new SaveInvalidation(this::saveSlide)
, which applies the proper scaling to the
slide Group
, and it runs a virtual thread to use the LocalDataRepository
to store the slide snapshot in the background. Then, it updates the UI safely,
informing the slide was saved.
The SaveInvalidation
implementation takes place as a static nested class.
The method slideChanged
invalidates the save state.
The method validateNow
runs the saving effect, then it validates the state
with slideChanged
and saves the current time to avoid bloating the CPU if
there are more changes continuously.
Lastly, when validateLater
is called, it starts a virtual thread that runs
within a minimum WAIT_TIME_MS
span, and after waiting for any necessary
cool-down time, it calls validateNow
. Hence, “later” because of the
multithread behavior.
The AutoSave
class was integrated into the SlideDrawingController
to set the
drawing and change events.
The repository specified and implemented in the package data
, the saving
logic, and a state machine made it possible to implement a robust auto-saving
mechanism for the slides app, which adds further automation under the hood,
besides the AI implemented in the previous section.
Automation of Screenshots and Code Snippets Content
I foresaw an opportunity to develop a gorgeous example project to start my automation journey for content like screenshot stories and code snippet slides.
This is my first blog with EP, and it was extremely extensive. I blogged the development process with high granularity and left insights, as I usually do.
The implementation was feasible with JavaFX, as expected, which shows how JavaFX is a powerful platform for well-defined engineering applications. Although the app is an example project, the development was significantly complex. Also, remember that we can use Kotlin with Arrow to further enrich the platform’s robustness via functional approaches for the domain languages.
There were good exercises for putting into practice tools such as JavaFX, new Java 21+ features, and regular expressions. The exercises scaled up to the level of an advantageous AI OCR integration that left plenty of insight into domain automation.
The development of this EP was the next step to take —from theory to practice— many design concepts and standards I had devised before.
The main package slides
defined the domain of the application, while several
others were required to design the desktop app, namely, the ui
, data
,
drawing
, drawing.ai
, and ai
packages containing the development of
features such as drag-and-drop, item arranging, menus with shortcuts,
pagination, advanced drawing of both slide shapes and AI shapes, advanced
parsing of code snippet language tokens, captions, cross-domain OCR word
manipulation for automated underlining, and essentials like auto-saving and
under-the-hood requirements to enhance the app design and usability.
The final development results turned into a desktop app with a master-view-detail layout able to create three kinds of slides: code snippet, code shot, and screenshot. The code snippet slides are mainly encouraged since they belong to the domain by consisting of source code, while the other two consist of binary images as inputs.
The slides app is a system that allows us to optimize the building of domain-specific presentations. As a math software engineer, such achievement boosts my automation tools one step further in these graphical domains, leading to fastening future undertakings.
Bibliography
- JavaFX | openjfx.io.
- JavaFX Fundamentals | dev.java.
- SDKMAN.
- Java Installation Guide | foojay.io.
- Beginning JavaFX Applications with IntelliJ IDE | foojay.io.
- IntelliJ IDEA.
- Scene Builder.
- Drag-and-Drop Feature in JavaFX Applications | Docs | Oracle.
- Using JavaFX UI Controls: Pagination Control | Docs | Oracle.
- A categorized list of all Java and JVM features since JDK 8 to 21 | Advanced Web Machinery.
- JavaFX - 2D Shapes | tutorialspoint.com.
- What is OCR? | Northern Essex Community College.
- Tess4J | SourceForge.
- Photopea | GitHub.
-
Photopea is a free web app that resembles Photoshop and has been my choice for many years ↩
-
In this case, two
ImageItem
s are equal if their names are equal ↩ -
The binary
Image
field made it impossible to update the same item from aList
with different object instances but the same name ↩ -
It doesn’t have to be a GP PL, like HTML or CSS which are niche languages ↩
-
I used ChatGPT to generate the keywords, and GitHub language colors, i.e., the mechanical job ↩
-
I didn’t track those files in Git because they’re nasty binary files and third-party on top of that, so you have to copy them manually ↩
-
This conversion from JavaFX to AWT is similar to when saving snapshots to the disk ↩
-
Make sure to implement a thread-safe implementation, see how I use the
Platform.runLater
call to send updates to the JavaFX UI thread ↩