Data files contain the actual row data.
| Column name | Column type | |
|---|---|---|
data_file_id |
BIGINT |
Primary key |
table_id |
BIGINT |
|
begin_snapshot |
BIGINT |
|
end_snapshot |
BIGINT |
|
file_order |
BIGINT |
|
path |
VARCHAR |
|
path_is_relative |
BOOLEAN |
|
file_format |
VARCHAR |
|
record_count |
BIGINT |
|
file_size_bytes |
BIGINT |
|
footer_size |
BIGINT |
|
row_id_start |
BIGINT |
|
partition_id |
BIGINT |
|
encryption_key |
VARCHAR |
|
mapping_id |
BIGINT |
|
partial_max |
BIGINT |
data_file_idis the numeric identifier of the file. It is a primary key.data_file_idis incremented fromnext_file_idin theducklake_snapshottable.table_idrefers to atable_idfrom theducklake_tabletable.begin_snapshotrefers to asnapshot_idfrom theducklake_snapshottable. The file is part of the table starting with this snapshot id.end_snapshotrefers to asnapshot_idfrom theducklake_snapshottable. The file is part of the table up to but not including this snapshot id. Ifend_snapshotisNULL, the file is currently part of the table.file_orderis a number that defines the vertical position of the file in the table. It needs to be unique within a snapshot but does not have to be contiguous (gaps are ok).pathis the file path of the data file, e.g.,my_file.parquetfor a relative path.path_is_relativewhether thepathis relative to thepathof the table (true) or an absolute path (false).file_formatis the storage format of the file. Currently, onlyparquetis allowed.record_countis the number of records (row) in the file.file_size_bytesis the size of the file in bytes.footer_sizeis the size of the file metadata footer, in the case of Parquet the Thrift data. This is an optimization that allows for faster reading of the file.row_id_startis the first logical row id in the file. (Every row has a unique row id that is maintained.)partition_idrefers to apartition_idfrom theducklake_partition_infotable.encryption_keycontains the encryption for the file if encryption is enabled.mapping_idrefers to amapping_idfrom theducklake_column_mappingtable.partial_maxis the maximum snapshot id stored in a partial data file. When multiple snapshots are merged into a single file, per-row snapshot ownership is tracked via the_ducklake_internal_snapshot_idcolumn embedded in the Parquet file.partial_maxrecords the highest snapshot id present in that merged file, so reads and time travel can determine whether snapshot filtering is necessary. It isNULLfor files that are not shared across snapshots.