Setup
I have a multithreaded Java application which will receive 200-300 requests per second to perform a task 'A'(which take approximately 30 milliseconds) on an input received in a request.
The application has a cache(max size = 1MB) which is read by each thread to perform task 'A' on input received:
public class DataProvider() {
private HashMap<KeyObject, ValueObject> cache;
private Database database;
// Scheduled to run in interval of 15 seconds by a background thread
public synchronized void updateData() {
this.cache = database.getData();
}
public HashMap<KeyObject, ValueObject> getCache() {
return this.cache;
}
}
KeyObject and ValueObject are POJO. ValueObject contains List of another POJO.
For every request received task is done in following way:
public class TaskExecutor() {
private DataProvider dataProvider;
public boolean doTask(final InputObject input) {
final HashMap<KeyObject, ValueObject> data = dataProvider.getCache(); // shallow copy I think
// Do Task 'A' using data
}
}
Problem
One of the thread starts executing task 'A' at timestamp 't' using data 'd1' from cache. At time 't + t1' cache data gets updated to 'd2'. Thread now starts using data 'd2' to finish rest of the task. Task gets completed at 't+t1+t2'. Half of the task was completed with different data. This will lead to invalid outcome of task.
Current Approach
Each thread will create a deep copy of the cache and then use the deep copy to perform the task using one of the following approach(best in performance) to perform deep copy:
How do you make a deep copy of an object in Java?
Deep clone utility recommendation
Limitation
Cloning using deep copy will create thousand of objects which may crash JVM.
All the cloning approaches don't look good in terms of performance.