The importance of team alignment in business

Strong strategic alignment is a key differentiator between high-performing and low-performing companies. Team alignment has a tangible financial impact on an organization and significantly improves…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Understanding Iceberg Table Metadata

Dated: 30-Jan-2023

Companies are choosing data lakes to store their data for analytics and insights. Any challenges they encounter with data lakes are often related to performance and ACID-compliant transactions when multiple consumers are trying to work on the same data set. However, using a table format such as Apache Iceberg in the data lake architecture can resolve a lot of these issues.

In this blog post we will look closely at how Apache Iceberg’s metadata enables fast performance for analytical queries over large data sets.

Apache Iceberg is an open table format for data lake architecture with large analytic data sets, and was given to the Apache Software Foundation by Netflix, who developed it. Lately, Apache Iceberg is becoming the de facto solution for managing data in data lakes. Some of the great features that Iceberg Tables offer are:

Because a Hive table’s state is reconstructed by listing files, changes are not atomic, therefore potentially producing inaccurate results, and listing operations are slow for large-scale analytics. Apache Iceberg overcomes these problems by maintaining its own metadata layer. As you can see in the diagram below, there are two layers that Iceberg uses for persisting the data. One is the metadata layer and the other is the data layer which is the actual data in formats such as Apache Parquet, Avro, or ORC. The file pruning intelligence of Iceberg comes from the metadata layer, where it can efficiently track which files and folders to eliminate first, and subsequently scan the data file stats to figure out if that file needs to be read or not for a specific query.

Data files: Files that can be in Parquet, ORC, or AVRO format, and which store the actual data for the Iceberg Table.

Iceberg has a tree structure that stores snapshots of the data created for every DML/DDL operation on the Iceberg Table. As we can see from the metadata folder, multiple metadata.json files have been created. Any time you perform DML or DDL, you will see a new metadata file that has been created.

The first thing we need to determine is which is the latest metadata.json file. There are multiple ways to accomplish this. You could look at the increasing order of the serial number (for example, v1.metadata.json, v2.metada.json) to identify the latest one, or you could open up the version-hint.text and identify the latest number that points to the latest metadata.json file. Below is the snippet from the version-hint.text:

Image by Author: Metadata json files
Image by Author: Content from Version-hint file

As you can see in the example above, the latest number is 2 and so the latest version of the metadata file will be v2.metadata.json.

Now let’s insert a single record in the Iceberg Table and understand the metadata that is created.

Now we will go through all the files that are part of the metadata layer. Let’s start with the v2.metadata.json file. As you can see in the image below, this file has details about the schema of the table, current snapshot id information, and the manifest-list file. This information helps us to identify the latest snapshot id and the latest manifest list file for the snapshot of the table. Metadata file generated in the customer-managed storage only has the details about the latest snapshot id.

Meatadata.json file

Image by Author: Metadata json content

This file has the details about the current snapshot and associated manifest list details. Apart from the current snapshot, it even stores the history of the snapshot, which helps in using features such as Time Travel. Apart from the schema for the table, it also stores the partition details along with the column id, which is used for partitioning the table as it helps in partition evolution. We will generate a new metadata.json file by inserting more records into the table. Here we are using a TPC-H sample database to insert records into our Iceberg Table and inserting 5 months of data partitioned by month.

Below is the snapshot information from the latest metadata.json file:

You can see the list of new files that have been added (12 files) as well as the new partitions created as part of the transaction. In addition, the medata.json file maintains the history of the snapshot.

Now that we understand the metadata JSON file, let’s look more closely at the manifest list to better understand how this file is used by Iceberg for directory pruning, based on the predicated clause used in the select queries. You can simply use the current-snapshot-id from the latest metadata.json and look for the name and path for the manifest list file. The manifest list file name is something like snap-(current-snapshot-id)*.avro as shown in the screenshot below:

As we have already discussed, the manifest list contains the list of manifest files that are part of a snapshot, along with some additional details, such as how many data files got added or deleted. Let’s see that in action. All the manifest list and manifest files are in avro format. Below are the details that you will see in the manifest list file. Manifest list includes metadata information, which can be used to avoid scanning all manifest files in a snapshot when planning to execute a query.

Image by Author: Manifest List Details

Some of the required fields in the manifest list file are:

You don’t see the actual value of the columns as the lower and upper bounds of the values in the partitions are serialized to bytes using the single-object serialization.

Image by Author: Metadata File Details

If you further expand the value counts, you will find the number of records in each of these columns.

Now let’s insert more records into the Iceberg Table and look at both the data and metadata that is generated.

When you execute the above command, you will find that a new metadata.json file is generated.

Image by Author: New Snapshot

Let’s look into the manifest list file and the manifest file to understand the column-level stats that are created after inserting more records in the table. You can see in the screenshot below that the manifest list tracks the count of records that were inserted into the table, along with the number of data files created.

By looking more closely at the manifest file, we see there are entries for 14 data files created as part of the transaction. The snippet below is from one of the entries, and it contains information about how many records there are in the data file as well as the number of records in each column. These stats help Iceberg skip a data file (file pruning) if the value does not fall between the low and high boundary values.

Image by Author: Metadata File column stats for each parquet file

In the section below, we will look at how Spark makes use of table metadata to perform query optimizations using record counts, lower/upper bound value, and the predicate clause of the select statement for partition pruning (skipping partitioned folders). We’ll also look at how to use column statistic information in the Parquet files to identify which data files need to be skipped to fetch the records for the query submitted.

Image by Author: Iceberg Table data partitioned by month

Below is the snippet of the parquet files which are created for the month 1992–02.Below is an example of the snippet of the Parquet files created for the month 1992–02:

Image by Author: Parquet files from single partition

Let’s run some select queries to understand how Iceberg does the partitioning pruning for the select queries with a predicate clause using the manifest list. In total there are 127 Parquet files in 12 folders for data from the period 1992–01–01 to 1992–06–30, and from 1993–01–01 to 1993–06–30. Let’s run a select query without any predicate clause and look at the query plan.

Below is the query plan for the Spark job with no filters in the select query:

As you can see, this query went through scanning all the files and no pruning occurred at either the directory or file level. Now we will add a filter clause on the date column, which is partitioned by month, and look at the query plan again:

Image by Author: Query plan with predicate clause

As you can see from the query plan above, the query has only read 4 Parquet files out of the 127 Parquet files in the first two partitions (1992–01 and 1992–02). In this scenario, directory-level pruning occurs based on the predicate clause, simply skipping the folders that are not required, and later file-level pruning occurs based on the file-level statistics.

Using Iceberg Tables, you can expect lower query compilation times than the query run times. In the next post we will discuss Iceberg hidden partitions and partition evolution. As you read this series, we hope you will be truly amazed by some of the great features Apache Iceberg Table has to offer.

In this blog post we have seen what makes up the Iceberg Table metadata, and got an understanding on how the stats for the table in the manifest files are used while queries are executed at runtime. Iceberg is a great choice when the requirement is to keep the data in open format, as it offers support for interoperability between compute engines to work on the same data set with great performance.

If you’d like to learn more about Iceberg, I suggest a few resources:

Add a comment

Related posts:

Alternative facts

That is what Euterpe informed Calliope she was using when Calliope attempted to inform her she witnessed MM choking Clio at the visitation exchange. No. Alternative facts are those you use to ramp…

What is Autocorrelation

Autocorrelation is a statistical measure that calculates the linear relationship between a series of observations and a lagged version of the same series. It is a key concept in time series analysis…

flocos de neve

Um supermercado jogava os alimentos vencidos no lixo. Um guri ia lá e pegava; claro que é menos ruim comer coisa vencida do que morrer de fome. Ele fez isso algumas vezes. Até que um dia um cara foi…