Omni Loader documentation

Performance

Dry run

When you wish to know the exact SQL scripts we are going to be using, you can choose this option. We will not write to the target, but will generate and log the SQL scripts of the whole migration.

Exact record count

Our migration engine is written in such a way that we don't need exact number of records or table sizes to generate pretty optimal defaults. For very large tables it may take a very long time to retrieve record count, for example. So, by default we query database statistics only. If your database is small-ish and you don't mind waiting, you can turn this option on and we will query exact table record count, which will give you the benefit of progress bars always going exactly from 0 to 100%.

Adjust pressure on agents

Omni Loader migration engine is a massively parallel distributed cluster. You can have hundreds of workers, each doings the job assigned, never blocking any of the others. To optimally assign jobs, an agent runs as many jobs in parallel as there are CPU cores. You can adjust the pressure on each agent by specifying more or less workers.

Ingestion workers

Number of jobs ingesting the data into the data warehouse target from the data lake storage. This is separate from the data extraction workers, as usually one can have dozens of extractions jobs without overloading the source database, but not so many ingestion workers as ingestion puts more pressure on the target.

Slice tables and partitions

If you have very large tables, it is important to read its many parts in parallel. A single reader has a very real ceiling of performance, so you can multiply the throughput by using more than one connection. Omni Loader will dynamically generate the SQL condition for each slice and load as many of them in parallel as there are available workers.

Minimal slice, in MB

Especially for data warehouse target, you don't want to add unnecessary pressure by loading many tiny files from the data lake. This setting will lower the number of slices if necessary.

Slice file by rows

When reading non-indexed large tables, there is no efficient way to subdivide the work and load many parts of that table in parallel. So, to avoid multiple huge table scans we employ a single read. However, to avoid a very slow ingestion, Omni Loader outputs several files and ingests them into the target in parallel. You can specify number of rows for each staging file here.

Distribution

For Azure Synapse Analytics, we will copy the distribution if source table has it. Alternatively, you can specify the table distribution model here. Please note that for tables less than 4GB in size, we will use REPLICATE distribution.