Attribute Values | Meaning | Configuration Guide |
format-version | Iceberg table version: Valid values are 1 and 2, with a default of 1. | If the user's write scenario includes upsert, this value must be set to 2. |
write.upsert.enabled | Whether to enable upsert: The value is true; if not set, it will not be enabled. | If the user's write scenario includes upsert, this must be set to true. |
write.update.mode | Update Mode | Set to merge-on-read (MOR) for MOR tables; the default is copy-on-write (COW). |
write.merge.mode | Merge Mode | Set to merge-on-read (MOR) for MOR tables; the default is copy-on-write (COW). |
write.parquet.bloom-filter-enabled.column.{col} | Enable bloom: Set to true to enable it; it is disabled by default. | In upsert scenarios, this must be enabled and configured according to the primary keys from the upstream data. If there are multiple primary keys in the upstream, use up to the first two. Enabling this can improve MOR query performance and small file merging efficiency. |
write.distribution-mode | Write Mode | The recommended value is hash. When the value is hash, data will be automatically repartitioned upon writing. However, the drawback is that this may impact write performance. |
write.metadata.delete-after-commit.enabled | Enable automatic metadata file cleanup. | It is strongly recommended to set this to true. With this setting enabled, old metadata files will be automatically cleaned up during snapshot creation to prevent the buildup of excess metadata files. |
write.metadata.previous-versions-max | Set the default quantity of retained metadata files. | The default value is 100. In certain special cases, users can adjust this value as needed. This setting should be used with write.metadata.delete-after-commit.enabled. |
write.metadata.metrics.default | Set the column metrics mode. | The value must be set to full. |
Attribute values | Default System values | Meanings | Configuration guide |
commit.retry.num-retries | 4 | Number of retries after a submission failure | When retries occur, you can try increasing the number of attempts. |
commit.retry.min-wait-ms | 100 | Minimum time for waiting before retrying, in milliseconds | If conflicts are very frequent and persist even after waiting for a while, you can try to adjust this value to increase the interval between retries. |
commit.retry.max-wait-ms | 60000(1 min) | Maximum time for waiting before retrying, in milliseconds | Adjust this value with commit.retry.min-wait-ms. |
commit.retry.total-timeout-ms | 1800000(30 min) | Timeout for the process of submitting the entire retry | - |
Transformation policy | Description | Types of original fields | Types after transformation |
identity | No transformation | All types | Being consistent with the original type |
bucket[ N, col] | Hash bucketing | int, long, decimal, date, time, timestamp, timestamptz, string, uuid, fixed, binary | int |
truncate[ col] | Fixed-length truncation | int, long, decimal, string | Being consistent with the original type |
year | Extract year information from fields | date, timestamp, timestamptz | int |
month | Extract month information from fields | date, timestamp, timestamptz | int |
day | Extract day information from fields | date, timestamp, timestamptz | int |
hour | Extract hour information from fields | timestamp, timestamptz | int |
Scenario | CALL statements | Execution Engine |
Querying history | select * from DataLakeCatalog.db.sample$history | SuperSQL Spark (sql) engine and SuperSQL Presto engine |
| select * from `DataLakeCatalog`.`db`.`sample`.`history` | SuperSQL Spark (job) engine and standard Spark engine |
Querying snapshot | select * from DataLakeCatalog.db.sample$snapshots | SuperSQL Spark (sql) engine and SuperSQL Presto engine |
| select * from `DataLakeCatalog`.`db`.`sample`.`snapshots` | SuperSQL Spark (job) engine and standard Spark engine |
Querying data files | select * from DataLakeCatalog.db.sample$files | SuperSQL Spark (sql) engine and SuperSQL Presto engine |
| select * from `DataLakeCatalog`.`db`.`sample`.`files` | SuperSQL Spark (job) engine and standard Spark engine |
Querying manifests | select * from DataLakeCatalog.db.sample$manifests | SuperSQL Spark (sql) engine and SuperSQL Presto engine |
| select * from `DataLakeCatalog`.`db`.`sample`.`manifests` | SuperSQL Spark (job) engine and standard Spark engine |
Querying partitions | select * from DataLakeCatalog.db.sample$partitions | SuperSQL Spark (sql) engine and SuperSQL Presto engine |
| select * from `DataLakeCatalog`.`db`.`sample`.`partitions` | SuperSQL Spark (job) engine and standard Spark engine |
Rolling back the specified snapshot | CALL DataLakeCatalog. system.rollback_to_snapshot('db.sample', 1) | SuperSQL Spark engine and standard Spark engine |
Rolling back to a specific point in time | CALL DataLakeCatalog. system.rollback_to_timestamp('db.sample', TIMESTAMP '2021-06-30 00:00:00.000') | SuperSQL Spark engine and standard Spark engine |
Setting the current snapshot | CALL DataLakeCatalog. system.set_current_snapshot('db.sample', 1) | SuperSQL Spark engine and standard Spark engine |
Merging files | CALL DataLakeCatalog. system.rewrite_data_files(table => 'db.sample', strategy => 'sort', sort_order => 'id DESC NULLS LAST,name ASC NULLS FIRST') | SuperSQL Spark engine and standard Spark engine |
Snapshot expiration | CALL DataLakeCatalog. system.expire_snapshots('db.sample', TIMESTAMP '2021-06-30 00:00:00.000', 100) | SuperSQL Spark engine and standard Spark engine |
Removing orphan files | CALL DataLakeCatalog. system.remove_orphan_files(table => 'db.sample', dry_run => true) | SuperSQL Spark engine and standard Spark engine |
Rewriting metadata | CALL DataLakeCatalog. system.rewrite_manifests('db.sample') | SuperSQL Spark engine and standard Spark engine |
Was this page helpful?
You can also Contact sales or Submit a Ticket for help.
Help us improve! Rate your documentation experience in 5 mins.
Feedback