Which file format is recommended for data files stored in S3 to enable efficient querying via a SQL endpoint?

Unlock all questions

This demo includes only 20 questions. Upgrade to access hundreds of questions, flashcards, exam simulations, and disable ads.

Full question bankExam simulationsFlashcards

From $25.99Unlock all

Prepare for the Fabric Analytics Engineer Associate Test with comprehensive materials. Explore flashcards, multiple choice questions, and detailed explanations. Get ready for your success!

Multiple Choice

Which file format is recommended for data files stored in S3 to enable efficient querying via a SQL endpoint?

The main idea is that for SQL-style analytics, the format should let the engine read only what’s needed. Parquet is a columnar storage format, which means data is stored column by column rather than row by row. This enables column pruning (only reading the columns your query uses) and predicate pushdown (filters are applied as data is read), which drastically reduces data scanned from S3, speeds up queries, and improves compression.

In a data lake on S3, this efficiency matters for cost and performance. Parquet also stores metadata and statistics that help the SQL engine decide early which data can be skipped, and it handles nested data well, which is common in analytics workloads.

CSV reads entire files and is row-oriented, offering no schema and poor column-level pruning. JSON is flexible but text-based and not columnar, leading to heavier parsing and larger scans. Avro is binary and efficient for row-oriented access, but it doesn’t provide the same columnar benefits for selective column queries as Parquet.

Which file format is recommended for data files stored in S3 to enable efficient querying via a SQL endpoint?

Prepare for the Fabric Analytics Engineer Associate Test with comprehensive materials. Explore flashcards, multiple choice questions, and detailed explanations. Get ready for your success!

Which file format is recommended for data files stored in S3 to enable efficient querying via a SQL endpoint?

Get the latest from Passetra