Data Management 2017-09-24T17:03:34+00:00

Data Management

We provide a wide range of data management services, specifically:

  • Data Architecture & Design
  • Database Management
  • Data Quality Management
  • Master Data Management
  • BI & ETL
  • Metadata Management

Batch Data Performance

There are a number of techniques that can be applied, individually or in combination, to improve the performance of batch data processes. The most common ones are:

Multi-threaded Processing

Some batch programs will only work with a single record at a time, effectively ignoring the full capacity available from modern CPU architectures. Multi-threading involves concurrently processing records to improve performance by utilising all available processing power. This can be the cheapest way to improve older processes as it is a pure software solution.

Batch Distribution

This involves splitting a singular linear process (for example, processing each customer in a database) into multiple processes. Records are allocated a batch id and this is used in the batch query to reduce the volume of data processed. Each batch is processed on a different machine, effectively distributing the processing load across them.

Message Queue Distribution

Where batches make a decision about processing records or are loading records into other systems, a message queue distribution can make a performance improvement. Time-consuming logic is moved from the main batch loop by adding a message to a queue. This queue is then monitored by separate server instances to distribute the processing, or can be used to split the process across CPU threads.

Other techniques include database read instancing, re-batching (such as grouping SQL upsert commands) and refactoring of the original code to improve performance.

Each project is individually analysed and a custom optimisation solution is designed.

Streaming Data

Improving the performance of real-time streaming data requires custom profiling to determine where performance gains can be achieved.

Techniques such as compression, distribution across servers using message queues and custom client/server networking code can improve performance and reduce costs where data tariffs are incurred for transmission.

Contact us to discuss your streaming performance requirements.