In the near past, 5 years ago, the selection of extended reality (XR) devices on the market was extremely limited. This is not the case today, as there is a plethora of different devices available, with different capabilities, input methods, strong and weak points, pros, and cons. It is obvious that not every device is suitable for every use case and that is exactly where the first question arises – which device is the right one for a particular use case? Should we use a VR device like the Meta Quest for an immersive experience, or perhaps the user needs to be aware of their surroundings? Should we use the HoloLens 2 for intuitive hand interaction, or perhaps the RealWear Navigator 500 for a hands-free experience?
At Augment IT, it is often the case that we need to prototype or implement the same, or a similar solution for multiple different devices. Proof of concepts, user testing, or products spanning multiple platforms are just a few such cases. So, what exactly is the challenge that we face when such a need manifests?
Different Operating Systems (OS) and Application Programming Interfaces (API)
Most devices run on a different OS. Some run on customized versions of an existing OS, like the Meta Quest device family which runs Android, or the HoloLens 2 which runs a version of the Windows OS. Other devices, on the other hand, run on their own custom OS. Such is the case of the Apple Vision Pro, which runs on visionOS, which Apple claims is built from the ground up.
This means that when developing an application, for each device that needs to be supported, a different build pipeline is required, using different build tools, and generating the final artifact of the appropriate type, like an APK file for Android, or an appx file for the HoloLens 2.
Since most of the devices run on different OSs, it is natural that they provide unique interfaces to the developers for their apps to communicate with the OS. One glaring example of this is how the user interacts with the device and the applications built for it. Some devices work with controllers, some with hand tracking and gestures, some with eye tracking, some with voice commands, but most with a mix of these input types. Apart from this foundational difference, there are, of course, capabilities that are available on some, but not on other XR devices. One such example is taking photos. HoloLens 2 provides an in-app API, Android-based devices rely on either the in-app camera stream or other apps installed on the device, whereas the Apple Vision Pro imposes restrictions on this use case to protect its user’s privacy and makes it impossible to solve it by conventional means.
A satisfactory solution to the cross-platform problem solves it efficiently and can be extended and maintained in the long run. Over the years, the engineers at Augment IT have experimented with many approaches to solving this issue efficiently and pragmatically.
The naive solution
When solving a given problem, especially a new or unique one, a frequent practice is to start with a simple and obvious solution, identify the issues, and iterate, improving the solution one step at a time. This is how we began our journey towards the solution.
The initial train of thought was: “Let us keep everything and customize only the parts that are different.”
Unity is usually the primary tool for building XR experiences. It can export one code base to many different platforms so that already covers one part of the problem: packaging the application. Luckily, Unity also offers a way to conditionally compile code, meaning that a part of the code may be ignored for a particular platform or device, but included for another.
This mechanism is built in the C# programming language and utilized by Unity for this purpose. It is called preprocessor directives. Let us say that we are building an application that needs to run on an Android phone and the HoloLens 2. If a part of the code needs to be executed on Android, but another part is executed on the Universal Windows Platform (UWP), which is the platform for HoloLens 2 development, it would look something like this.
#if UNITY_ANDROID
// Code that would be executed only when the platform is set to Android
#elif UNITY_WSA
// Code that would be executed only when the platform is set to UWP
#endif
This is also useful when there is platform-specific code that should not be executed in the Unity editor while developing the app, but executed when the application is running on the device. In that case, the UNITY_EDITOR symbol is used. These symbols can also be used with negation, so to ignore a part of the code in the editor, the if statement could be checked for !UNITY_EDITOR instead. Even though we can define our symbols, Unity already offers a lot of useful built-in ones out of the box.
This solves another part of the problem but is not a complete solution. Different input types are still an issue. For example, on an Android device, the user interacts mostly with touch input,
whereas on the HoloLens 2, the user interacts primarily with their hands, utilizing hand tracking and gestures. Using the same user interface (UI) implementation is possible, but strongly discouraged. There are multiple reasons for this to be avoided. In the current example, a 2D UI is required for the Android phone application and a 3D UI for the HoloLens 2 application. It is extremely impractical to morph Unity UI at runtime from 2D to 3D and vice versa. And the solution gets worse if we add one or more supported platforms to the mix. A single UI component having to handle 2 or more different devices is extremely hard to maintain. Fixing an issue on one platform may cause issues on another one, workarounds will be implemented, and the code base’s quality will quickly deteriorate.
So, another question that leads closer to the solution arises: “How can we solve the different UI issues more practically?”
A better, but still naive solution
At this point, it is obvious that the biggest issue is that the UI logic has too many responsibilities on its plate and needs to be split and simplified. Unity, for the second time, offers a solution to this problem, this time in the form of additive scene loading.
Everything that is displayed in a Unity application is organized into scenes. This includes the UI, 3D objects, and all scripts necessary for the application to function. Having multiple scenes and switching between them was usually avoided as it can be resource-intensive and causes interruptions in the application flow, something that is usually not desired, especially not in enterprise applications.
Instead of switching between scenes, additive scene loading enables two or more scenes to be active at the same time. With this, it is possible to extract all the common objects and components into one scene, and the platform-specific elements into one or more other scenes.
So, with a bit of restructuring and scripting, it is easy to achieve this solution. The Core scene contains the common elements and a script that loads other scenes additively depending on the platform we are exporting to.
The scenes here are all added to the hierarchy for visualization purposes. Only the Core scene should be added, which will then load the required platform scene at runtime. All these scenes must be added to the build settings for this approach to work.
using UnityEngine;
using UnityEngine.SceneManagement;
public class SceneLoader : MonoBehaviour
{
private void Awake()
{
#if UNITY_ANDROID
// Code that would be executed only when the platform is set to Android
SceneManager.LoadSceneAsync("Android", LoadSceneMode.Additive);
#elif UNITY_WSA
// Code that would be executed only when the platform is set to UWP
SceneManager.LoadSceneAsync("HoloLens2", LoadSceneMode.Additive);
#endif
}
}
This is another step in the right direction. The UI is cleanly separated, and each platform handles its components. However, the whole code base is in a single project, and that invites other issues to the table. Issues caused by the selective compilation, library compatibility, and testing on the different supported platforms are just a few of those issues and in the next part, we will describe how this can be even further improved, getting closer to the latest solution that we most frequently use and explaining its benefits, and drawbacks.