Vertica Storagebundlereader: Error Reading Metadata of File

Share this article:
✉ Send Feedback for this Article.

ROS Bundling

Prior to Vertica 7.2, there were two files per column – a data file and an index file. Every bit of Vertica 7.2, this format no longer exists. Instead, both parts are now stored in one file. Additionally, multiple data files and index files can also be stored in one file.

Vertica file compages

The post-obit graphic illustrates the Vertica file architecture, and how tables and projections are boiled down to files:

New storage format

Vertica is designed to handle big information sets. File sizes on disks range from hundreds of megabytes to hundreds of gigabytes. The Tuple Mover helps to manage information storage. In-retentiveness buffers blot trickle loads, and the Tuple Mover moveout operation writes data into ROS containers as the memory buffer fills up. To manage the number of files on disk, the Tuple Mover mergeout functioning merges ROS containers of a similar size to create larger containers. For more information, encounter Tuple Mover All-time Practices.

While this method works well with most tables, it tin can nowadays issues on tables that have a large number of small files. This tin can happen if any of the following is truthful of your data fix:

Wide cavalcade tables contain NULL value columns
Partition ranges are pocket-size, such as division by minute. This is non a recommended practice.
Local segmentation is enabled and the gene is set to a higher number. This is not a recommended practice.

These factors create tiny files on-disk, causing a lot of fragmentation that can impact node recovery and backup operations. To resolve this issue, Vertica engineers developed the new on-disk storage format and layout, which reduces the number of files stored on disk.

How is the new format different?

The data file and its index file are now stored in one file. For example, before Vertica 7.2, a storage container with a ii-column projection stored six files. In Vertica seven.2 and later, the container stores iii files. If those files are smaller than 1MB, Vertica bundles them together, and stores them every bit 1 file.

Does the new functionality bundle all project files?

No, bundling occurs but within a storage container and inside a storage location. Column files that are smaller than MaxBundleableROSSizeKB are bundled together; other column files are kept as independent column files in a storage container.

Why is ROS bundling useful?

Bundling reduces the number of files in your storage containers, thus reducing the pressure on the underlying file organization, and facilitates faster fill-in and restore operations, equally well as recovery. Bundling does not reduce the size of the catalog, and it does non eliminate the ROS pushback error. Bundling does non change the number of file handles used when reading the column files.

What configuration parameters are associated with the bundling functionality?

Parameter	Data Type/Default	Description
EnableStorageBundling	Boolean/True	Enables or disables storage bundling.
MaxBundleableROSSizeKB	Integer/1024	Can exist set up to 1024 KB. If more than than 1 data file is under this size limit, in the same container and storage location, the files are arranged together.
CompactStorageJobSizeMB	Integer/2048	Controls a task size of the function COMPACT_STORAGE().

What is the upgrade touch from Vertica 7.1 to Vertica 7.2?

Upgrading does not automatically convert existing files from the quondam format to the new format. Subsequently you lot upgrade, you lot tin can use the function COMPACT_STORAGE() to catechumen the file format of existing files. If EnableStorageBundling is fix to true, the following behavior applies:

All new loads and inserts are written in the new storage format.
Existing storage that is merged out is written in the new storage format.

Are the old and new formats compatible?

Aye. If you upgrade your database and a table contains both small-scale and large files, you can selectively move small files to the new format, and leave large files in the erstwhile format.

How tin I motility data from the old storage format to the new format?

Use COMPACT_STORAGE() , as shown in the following example:

=> SELECT COMPACT_STORAGE(object-proper noun,

object-name,

min-ros-filesize-kb,

small-or-all-files,

simulate);

The sample output of this part looks like the following:

compact_storage

---------------------------------------------------------------------------------------------------------------

Task: compact_storage

On node node01:

Projection Proper name :public.foo_super | selected_storage_containers :ii | selected_files_to_compact :12 | files_after_compact : 2 | modified_storage_KB :0

On node node02:

Project Name :public.foo_super | selected_storage_containers :2 | selected_files_to_compact :12 | files_after_compact : 2 | modified_storage_KB :0

On node node03:

Project Name :public.foo_super | selected_storage_containers :2 | selected_files_to_compact :12 | files_after_compact : 2 | modified_storage_KB :0

Success

What if I specify a unlike value for min_ros_filesize_kb than the current MaxBundleableROSSizeKB?

The COMPACT_STORAGE() role parameter min_ros_filesize_kb is contained of the MaxBundleableROSSizeKB configuration parameter. The function uses the value y'all specify for min_ros_filesize_kb. This has no impact on the value of MaxBundleableROSSizeKB. If you lot desire to package files of a larger size in new mergeouts and loads, yous must change the value of MaxBundleableROSSizeKB.

When is the best time to convert to the new storage format?

For best results, catechumen existing files after you upgrade your Vertica database to seven.two, merely before you perform the get-go backup. This helps with backup and restore for 2 reasons:

Reduces the number of files, resulting in a faster fill-in.
Transfers new storage, then subsequent backups can run faster.

You tin use the database while running compact_storage.

Does COMPACT_STORAGE() affect query operation?

COMPACT_STORAGE() reads and writes files at the storage container level. Information technology uses less retentivity when compared with the mergeout performance, but competes with disk I/O.

How long does COMPACT_STORAGE() take to run?

Because this role rewrites files, execution time varies depending on how much data changes. Run a simulation by setting the COMPACT_STORAGE() function parameter simulate to True, to determine how much storage will modify.

You can rewrite storage formats incrementally past running COMPACT_STORAGE() iteratively at the table or projection level.

How can I know if my database will do good from this functionality?

If backup and restore operations are irksome, yous probably have many large files that are not getting merged. You lot can run the following query to find the median file size of a projection on a node:

=> SELECT MEDIAN(size) OVER() AS median_fsize

FROM vs_ros AS ros, storage_containers Every bit cont

WHERE ros.delid=cont.storage_oid

AND cont.node_name='node'

AND cont.projection_name='proj_name' limit one;

Yous can as well run COMPACT_STORAGE() in simulation style to see how many files information technology can reduce:

=> SELECT COMPACT_STORAGE('table_or_proj_name', 1024, 'pocket-size', 'true');

What system table can I use to see what storage containers are arranged?

Query the system tabular array VS_BUNDLED_ROS to see which column files are bundled:

node_name|   projection_id   |                  sal_storage_id                  |      ros_id       | size_bytes |    storage_id

---------+-------------------+--------------------------------------------------+-------------------+------------+-------------------

initiator| 45035996273721386 | 0262c017f1fb9eb26b8d8e6266a7005e00a0000000004041 | 45035996273721409 |       v    | 45035996273721409

initiator| 45035996273721386 | 0262c017f1fb9eb26b8d8e6266a7005e00a0000000004041 | 45035996273721413 |       5    | 45035996273721409

initiator| 45035996273721386 | 0262c017f1fb9eb26b8d8e6266a7005e00a0000000004041 | 45035996273721417 |       48   | 45035996273721409

(3 rows)

If I have large storage containers, can I utilize 100MB as the value for min_ros_filesize_kb?

To determine the value of min_ros_filesize_kb, run COMPACT_STORAGE() in simulation style to determine resource usage. For example, if you accept a storage container with 100 information files, each less than 1MB, COMPACT_STORAGE() changes these from 200 files to i file. In this example, specifying 100MB or 1MB for the value of min_ros_filesize_kb makes no difference.

If yous have another container with 100 data files, each about 40MB, COMPACT_STORAGE() changes these from 200 files to 1 file. In this case, specifying 100MB makes a positive difference. If we specified 1MB, the container goes from 200 files to 100 files.

To determine what value to set for min_ros_filesize_kb, follow this procedure:

Choice a project.
Determine the number of files and the median file size of the projection.
Run COMPACT_STORAGE() in simulation mode, with values that range from the median size to 100MB.
Determine the sweet spot for your database, based on the values produced from the simulation.

The following case shows how to clarify the demand and impact of compact_storage on an updated Vertica database, and determine an optimal value for min_ros_filesize_kb:

Check that the file counts are not bundled:

=> SELECT COUNT(distinct (salstorageid))

* ii /* one fdb and one pidx */

FROM v_internal.vs_ros

WHERE bundleindex < 0;  -- older versions dont have bundleindex

count

---------

(1 row)

Find the size and number of files that are not arranged, group by projection and node:

=> SELECT CASE WHEN segment_lower_bound is not null  THEN 'SEGMENTED' ELSE 'REPLICATED' Stop AS type,

schema_name,

projection_name,

max(used_bytes) max_used_bytes,

min(used_bytes) min_used_bytes,

Instance  WHEN segment_lower_bound is not null THEN count (distinct colid) ELSE count (singled-out colid), count(distinct rosid) Equally nFiles

FROM storage_containers bring together v_internal.vs_ros ON (delid = storage_oid)

WHERE bundleindex < 0 -- older versions dont accept bundleindex

Grouping Past 1,2,iii ,segment_lower_bound

ORDER Past 7 desc

type      | schema_name | projection_name | max_used_bytes | min_used_bytes | nCols | nFiles

----------+-------------+-----------------+----------------+----------------+-------+--------

SEGMENTED | schema      | clients_b1      |       92368520 |          14778 |    13 |    650

SEGMENTED | schema      | clients_b0      |       47906279 |          14728 |    13 |    611

SEGMENTED | schema      | clients_b0      |      100868741 |          18776 |    13 |    572

SEGMENTED | schema      | clients_b0      |       59835832 |          15173 |    xiii |    572

SEGMENTED | schema      | clients_b1      |       58541167 |          15441 |    13 |    507

SEGMENTED | schema      | clients_b1      |       61792636 |          14728 |    13 |    507

Classify file sizes in different buckets to see the distribution:

=> SELECT

WIDTH_BUCKET (used_bytes, 0, 1024*1024*1024, 999) AS bucket, -- 1GB size of max bucket, 1000 buckets, (i.e. 1MB each bucket)

count(rosid) fileCnt

FROM

storage_containers Bring together v_internal.vs_ros ON (delid = storage_oid)

WHERE

schema_name = 'schema'

AND projection_name='clients_b1'

AND bundleindex < 0 -- older versions dont have bundleindex

GROUP Past one

ORDER Past 1

bucket | fileCnt

-------+---------

1      |     715           <=== so many minor 1 MB or less files

4      |      13

v      |      26

10     |      26

14     |      39

15     |      52

16     |      65

17     |      26

xviii     |      52

19     |      39

xx     |     299

21     |      39

23     |      13

24     |      13

25     |      13

29     |      13

31     |      13

32     |      39

40     |      26

41     |      13

42     |      thirteen

44     |      13

45     |      39

48     |      13

49     |      13

55     |      13

58     |      xiii

86     |      13

(28 rows)

In this instance, we chose 5MB as the size to compact the storage. First, execute the dry run to run into the impact:

=> SELECT COMPACT_STORAGE('schema.clients_b1', 5*1024, 'SMALL', true);  -

            compact_storage

----------------------------------------------------------------------------------------------------------------

Task: compact_storage

On node v_scrutinload_node0001:

Projection Name :schema.clients_b1 | selected_storage_containers :l | selected_files_to_compact :1012 | files_after_compact : fifty | modified_storage_KB :35584

On node v_scrutinload_node0002:

Projection Proper name :schema.clients_b1 | selected_storage_containers :39 | selected_files_to_compact :730 | files_after_compact : 39 | modified_storage_KB :16566

On node v_scrutinload_node0003:

Project Name :schema.clients_b1 | selected_storage_containers :39 | selected_files_to_compact :728 | files_after_compact : 39 | modified_storage_KB :23126

Success

Execute COMPACT_STORAGE() . This uses IO and resources. Performing this by projection allows more control on when information technology is executed:

=> SELECT COMPACT_STORAGE('schema.clients_b1', 5*1024, 'Modest', faux);

            compact_storage

----------------------------------------------------------------------------------------------------------------

Task: compact_storage

On node v_scrutinload_node0001:

Projection Name :schema.clients_b1 | selected_storage_containers :50 | selected_files_to_compact :1212 | files_after_compact : 50 | modified_storage_KB :230564

On node v_scrutinload_node0002:

Project Name :schema.clients_b1 | selected_storage_containers :39 | selected_files_to_compact :928 | files_after_compact : 39 | modified_storage_KB :221211

On node v_scrutinload_node0003:

Projection Name :schema.clients_b1 | selected_storage_containers :39 | selected_files_to_compact :926 | files_after_compact : 39 | modified_storage_KB :227449

Success

View the new distribution of files:

=> SELECT

WIDTH_BUCKET (used_bytes, 0, 1024*1024*1024, 999) Equally saucepan, -- 1GB size OF max bucket, 1000 buckets, (i.e. 1MB each bucket)

count(rosid) fileCnt

FROM

storage_containers JOIN v_internal.vs_ros ON (delid = storage_oid)

WHERE

schema_name = 'schema'

AND projection_name='clients_b1'

AND bundleindex < 0 -- older versions dont accept bundleindex

GROUP BY 1

ORDER BY one

bucket | fileCnt

-------+---------

fourteen     |       ane    <== small files are gone, largest i is 14MB ....

15     |       3

16     |       seven

17     |       4

18     |       half-dozen

nineteen     |       6

20     |      46

21     |       4

23     |       ii

24     |       ii

25     |       two

29     |       2

31     |       2

32     |       6

40     |       4

41     |       two

42     |       2

44     |       2

45     |       9

48     |       3

49     |       three

55     |       iii

58     |       3

86     |       seven

(24 rows)

Compare the buddy projection (b1) to come across the impact:

=> SELECT

Example WHEN segment_lower_bound is non null  THEN 'SEGMENTED' ELSE 'REPLICATED' END Every bit type,

schema_name,

projection_name,

max(used_bytes) max_used_bytes,

min(used_bytes) min_used_bytes,

CASE  WHEN segment_lower_bound is not aught THEN count (distinct colid) ELSE count (distinct colid) END as nCols,

count(distinct rosid) as nFiles

FROM storage_containers Join v_internal.vs_ros ON (delid = storage_oid)

WHERE bundleindex < 0 -- older versions dont take bundleindex

Grouping BY i,2,3 ,segment_lower_bound

ORDER BY seven desc;

type      | schema_ | projection_| max_used_bytes | min_used_bytes | nCols | nFiles

----------+---------+------------+----------------+----------------+-------+--------

SEGMENTED | schema  | clients_b0 |    47906279    |    14728       |    xiii |    611

SEGMENTED | schema  | clients_b0 |   100868741    |    18776       |    13 |    572

SEGMENTED | schema  | clients_b0 |    59835832    |    15173       |    thirteen |    572

SEGMENTED | schema  | clients_b1 |    61792636    | 16300430       |     iii |     44     <=== better !

SEGMENTED | schema  | clients_b1 |    92368520    | 14086909       |     7 |     44

SEGMENTED | schema  | clients_b1 |    58541167    | 15202738       |     3 |     43

(6 rows)

bakerhoughter.blogspot.com

Source: https://www.vertica.com/kb/ROSBundling/Content/BestPractices/ROSBundling.htm