Why can performing a split by position on a column cause a dataflow to load more data than expected?

Prepare for the Fabric Analytics Engineer Associate Test with comprehensive materials. Explore flashcards, multiple choice questions, and detailed explanations. Get ready for your success!

Multiple Choice

Why can performing a split by position on a column cause a dataflow to load more data than expected?

Explanation:
Split by position can’t be folded back to the data source. In Power Query/Dataflows, query folding pushes as many operations as possible down to the source so only the needed rows are retrieved. When you split by position, the transformation isn’t something the source can apply, so the data must be pulled into Power Query first, then the split is performed and the filters are applied afterwards. That means more data is loaded into the flow than if the filtering could have happened at the source. So the reason this causes more data to be loaded is that the data is pulled into Power Query before filtering, rather than filtering at the source.

Split by position can’t be folded back to the data source. In Power Query/Dataflows, query folding pushes as many operations as possible down to the source so only the needed rows are retrieved. When you split by position, the transformation isn’t something the source can apply, so the data must be pulled into Power Query first, then the split is performed and the filters are applied afterwards. That means more data is loaded into the flow than if the filtering could have happened at the source.

So the reason this causes more data to be loaded is that the data is pulled into Power Query before filtering, rather than filtering at the source.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy