How to Integrate Azure OpenAI Batch Processing into Your Java Application

This blog post shows you how to integrate Azure OpenAI batch processing into your Java application. The Spring Framework has come a long way in the last 18 months or so. The new addition of Spring AI has made it relatively straight forward for Java Spring developers to integrate large language models into their applications opening the possibility for RAG pattern (Retrieval Augmented Generation) applications and a great deal more besides. The AI industry as a whole seems to have relentless forward momentum right now and the infrastructure around it is also bringing more to the table.

One such thing is the batch processing offered by Azure AI Services. Given the cost of AI generation, it seems only logical that model providers begin to offer a way to make things more affordable and processing data asynchronously and in bulk is one such solution available in the Microsoft Azure cloud. Spring AI does not yet have an implementation for batch processing however it is still possible to integrate it into your Java apps using the available rest endpoints provided by Azure.

Note: In this blog post I will assume that you have already deployed an AI model in the Azure AI Services cloud. I will also assume you already have experience in connecting that to your Java application. If not, then take a look at my previous blog post which describes exactly that here.

1. A brief overview

Azure AI Services will accept a .jsonl file for batch processing and provide another for download when the processing is complete. They aim to process files within 24 hours however they don’t expire files which take longer. They provide a number of rest endpoints for this but first, let’s take a look at the process…

2. Preparing you data for processing

The first step will be creating the .jsonl file with the data you want to process. Azure accepts the following json format:

The „custom_id“ property is an id that we define so that we are able uniquely identify the generated response. Given that we are likely processing data from a database in our application, it makes sense that we use the primary key of the data to be processed so that we can easily associate the processed data with the original data.

The „url“ property shown above is the actual URL that should be used here. It seems odd that we do not use the full URL of our Azure resource but it would seem the URL that we have here is simply appended to the URL of our resource internally by Azure.

In the „body“ object we can see the name of the model – „model“: „gpt-4o-batch“, this is the name of our deployed model. It is important to note that we need to deploy a model specifically for the purpose of batch processing, we cannot use models that we use for regular chat completions.

There are two main types of model available, Global, and DataZone. The difference is the location in which the data is processed. The uploaded file is always stored in the location of the deployed model however Global models may process your data anywhere in the world where there is capacity. This could be a problem if you need to stay compliant with GDPR regulations. In this case it is better to deploy a DataZone model which processes the data in the location it was uploaded. For developers working with European businesses and organisations this will be the better way to go and will help establish trust in the services being used.

We then see an array of „messages“. These are our system and user messages. The system message primes the behaviour of the AI model, and the user message is the data we want to process. To prepare our data for processing we will need to first create some records to work with the methods we will create. We will also need a dto to hold the String data to be processed and the „custom_id“ of our data.

The Message data type In the JsonlTaskBody above is a Spring AI data type. Next we will create a method to write the .jsonl file and helper method to prepare our data. The method writeJsonlFile creates the .jsonl, stores it as a temp file, and returns the file path so that we can find it later to upload the file to Azure AI Services. It is good practice to delete this file once it has been successfully submitted to prevent data piling up in the temp folder. This method calls the helper method which creates the json objects for each of our DataToProcess dtos.

The system prompt is a .st text file stored in the resources folder of the application.

3. Submitting your data for processing

Now that we have prepared our data for processing, we can submit it to Azure AI Services for processing. Let’s create a method which sends and HTTP request with our data and returns the file_id which Azure AI Services creates for our uploaded file.

Our method will have parameters for the API URL of the file upload endpoint: https://YOUR_RESOURCE_NAME.openai.azure.com/openai/files?api-version=2024-10-21as well as the API key of your Azure AI Services resource and the file path of the .jsonl file to be processed.

Once we have uploaded our file for processing, Azure Ai Services will validate it to make sure it conforms to the required format. If anything is not right, an error message is returned detailing what went wrong. Below is an example. The „errors“ property is a small part of a larger json object which is returned and shows what went wrong:

The file_id is returned immediately with one of the following statuses:

pending,
processed,
in_progress,
validating,
finalizing,
completed,
failed,
expired,
cancelling,
cancelled

4. Checking the status of the uploaded file

Once the file_id has been returned we can check the status of the uploaded file. The method we create will call the endpoint for the status. When the status is returned, if it is not „processed“ then the method will wait a minute before checking again. We also include a 30 minute timeout so that if „processed“ is not returned within 30 minutes then the method returns false and an exception is thrown.

The method simply returns true when the status is „processed“ so that the following code knows it is now time to start the batch process. At first glance, the use of ScheduledExecutorService seems to indicate that this method is asynchronous, however, it is ultimately blocking dut to the result.get(…). This is in fact what we want since when we have all of the methods we need we will call sequentially them from a scheduled task.

Let’s assume our file upload was successfully processed. Our Java application should now associate the returned file_id with the data which has been sent for processing. This could mean a column on the database table which holds the data being processed, so that each data point being processed has the file_id, or perhaps a new table which holds the file_id which we reference with a 1-N relationship from the original table. How you handle this will depend or your specific application architecture so I won’t go into more details here.

5. Starting the batch process

We now know that our jsonl. file has been accepted by Azure Ai Services so we can make the API call to start the process. The following method takes the API URL and API key as parameters, as well as the file_id we now have saved. It returns the id Azure AI Services assign to this specific batch process which we can use to check on the status later and eventually download the processed data. You will need to save this in the same way you did with the file_id earlier.

The headers we create for the API call contain the property completion_window. This must be set to 24h or the process will fail. Perhaps in future Azure may offer additional completion windows but for we can only set a value of 24h here. The method waits for a successful (2xx) HTTP response then returns the id of the batch process. In the event that it is not successful we throw an exception which gives us some feedback as to what went wrong.

6. Calling the methods to start processing batches

Now that we have put together the methods and objects that we need, let’s create a batch job which will run the process. The execute method below calls the methods we have created so far but there are 2 additional method calls, saveFileId and saveBatchId. You will need to create your own implementations of these which work with your existing codebase.

Assuming the batch process was started successfully we now play a waiting game until our data is ready to be downloaded. That means making an API call to check for completed batches, then downloading the file. Although Azure AI Services offer a completion window of 24 hours it has been my experience that the process is finished significantly faster than that, often between 10 and 15 minutes after the process is started. For this reason it is a good idea to run a scheduled task which check for completed batch processes every 5 minutes. It may also be the case that you are uploading data to be processed multiple times a day and so by continually checking for finished batches you get the results pretty much as soon as they are ready.

7. Checking for completed batches

With the following method we make an API call which returns us a list of file_ids of the completed batches. Our request has query parameters for start time and end time, which we pass into our method as parameters. This is because we only want the file_ids of batches completed since we last checked and we’re checking every five minutes.

8. Fetching the processed data

Once we have one or more file_ids of completed batches we can fetch the data. When the file is downloaded we can then parse the json data to extract the AI generated response for use in our application. Included in the json object for each generation is the „custom_id“ property which we set as the primary key of the data being processed. With this we can associate the processed data with the original data very easily and persist or further work with the data in our application. I have added an example of doing something with the data in this method however, the method does return the json data if you prefer to work with it in a separate method.

9. A batch job to fetch the data

If we set up a batch job scheduled to run every 5 minutes then the following execute() method can be used to fetch the processed data.

We will need to add the following annotated variables for the URL and API key.

10. What if something goes wrong?

Our implementation for fetching the file_ids of completed batches does not take into consideration that a batch may have failed, it simply looks for batches which have been successfully processed so that we can fetch the data. Azure AI Services provide a number of additional endpoints for checking the status of a particular batch as well as cancelling batch processing among others. Passing in the batch_id as a parameter to the following method we are able to call an API that will return the status of a specific batch.

The returned json is in the following format:

In this example the status is „Validating“.

11. Canceling a batch

In the event you decide to cancel a batch you can use the following method. Its structure is very similar to the last in terms of the parameters the method takes.

The status of the cancelled batch will be cancelling for approximately 10 minutes and then the status will be cancelled. The response to this request will contain the batch file id and the file is still downloadable as it may contain partial results.

12. A problem solved

This workflow solved a specific problem for me in developing a legal case summarisation feature for a project here at CIIT Software. Cost was a major concern when considering an AI solution but summarising a case could potentially balance that by speeding up the workflow of the end users. By finding a way to use the batch processing offered by Azure AI Services without relying on Spring I have been able to approximately half the cost of AI generation. Batch processing is great if you don’t need an immediate response as you would with AI chat.

Benjamin Rowley

How to Integrate Azure OpenAI Batch Processing into Your Java Application

This blog post shows you how to integrate Azure OpenAI batch processing into your Java application. The Spring Framework has come a long way in

February 13, 2025

Nginx Cache für WordPress

In der Welt des Webhostings ist die Geschwindigkeit ein entscheidender Faktor für den Erfolg einer Website. Hier kommt der Nginx-Cache ins Spiel, insbesondere für WordPress-Websites.

January 4, 2024

@Transactional in Spring – how it works

This article provides an in-depth exploration of Spring Framework’s @Transactional annotation, detailing its functionality, implementation, advanced features, and best practices for effective transaction management in Java applications.

January 6, 2024

Neues Training: Entwicklung von AI Chat-Assistenten mit Java / Spring AI

Der Kurs „Entwicklung von AI Chat-Assistenten mit Java Spring“ bietet einen umfassenden Einblick in die Implementierung von KI-Chat-Assistenten mit dem Retrieval-Augmented-Generation-Muster unter Verwendung der neuesten

February 29, 2024

Spring AI – getting started with large language models in Java Spring

Introduction to Spring AI On July 24th 2023 Dr Mark Pollack made the first commit of the new Spring AI framework. Inspired by Python frameworks

September 4, 2024

Project Valhalla – Mehr Performance und bessere Speicherverwaltung für Java

Project Valhalla Java Performance steht im Mittelpunkt einer neuen Initiative des OpenJDK, die darauf abzielt, Java schneller und speichereffizienter zu machen. Java gehört seit Jahrzehnten

February 19, 2025

1. A brief overview

2. Preparing you data for processing

3. Submitting your data for processing

4. Checking the status of the uploaded file

5. Starting the batch process

6. Calling the methods to start processing batches

7. Checking for completed batches

8. Fetching the processed data

9. A batch job to fetch the data

10. What if something goes wrong?

11. Canceling a batch

12. A problem solved

Benjamin Rowley

Nginx Cache für WordPress

@Transactional in Spring – how it works

Neues Training: Entwicklung von AI Chat-Assistenten mit Java / Spring AI

Spring AI – getting started with large language models in Java Spring

Project Valhalla – Mehr Performance und bessere Speicherverwaltung für Java

Contact

Quick Links