Quantcast
Channel: Power Query Archives –
Viewing all articles
Browse latest Browse all 111

Performance tip for aggregations after joins in Power Query and Power BI

$
0
0

This article was edited on 2nd Nov 2019. JoinKind.Local has been removed to avoid problems with merges on tables with primary keys:

In this article you’ll learn how to speed up the aggregation of joined/merged tables by orders of magnitude (I recorded up to 30 times faster execution times). This method works for merges where both table have multiple rows for each keys. If one of your tables has a primary key, the method Chris Webb describes here works just as good: Chris Webb’s article on how to improve performance on aggregations after joins using primary keys .

You can follow along the different methods in this file:  PerformanceAggregationsAfterMerges1_Upload.zip

Background

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):

Performance of native aggregation after join is very slow

But this method is extremely slow. Compared to “simply” expanding all values to new rows (which took around 5 seconds), the aggregation took around 50 seconds. The automatically generated code uses the “Table.AggregateTableColumn”-function. (see Query1_NativeAggregate)

Table.AggregateTableColumn(#"Merged Queries", "Expanded Custom", {{"Animal", each Text.Combine(_, ", "), "CombinedValues"}})

My first attempt to speed up performance was not to expand the column that contains the merged table at all, but to add a column instead with a manual aggregation function. (see Query2_AddManualAggregation)

 Table.AddColumn(#"Merged Queries", "CombinedValues", each Text.Combine([Expanded Custom][Animal], ", "))

This improved speed by 50-60%, but still, way slower than expanding all rows.

The unexpected solution

What turned out to be by far the fastest was to expand the columns to all the new rows and then “group back”. (see Query3_ReGroupIntegrated)

Table.Group(#"Expanded Expanded Custom", {"Column1", "Custom"}, {{"CombinedValues", each Text.Combine(_[Animal], ". ") }})

To my surprise, this was even faster than skipping this step (around 2 seconds, instead of 5). Means: This aggregation shortened the table from 676k rows to 26k rows. Of course, loading a shorter table should take less time. But I expected the computation of this aggregation also to take a fair amount of time. So at the end, this was actually less than the time gained by the shorting of the table.

But the surprise didn’t stop here. As I know many beginners aren’t so comfortable with editing existing code, I tried a different method (see Query4_ReGroupAddColumn): I kept the native “All rows”-operation and added the same column than in Query2_AddManualAggregation. And it was just as fast/even slightly faster than the fast Query3_ReGroupIntegrated.

So just by adding 2 steps: Expansion of the merged column and Re-Grouping I sped up the query significantly: Another mystery that the M-engine holds for me…

Please share your thoughts about and experiences with this method in the comments below. Maybe MS will change the code behind the “default-Aggregate”-function, if there is enough evidence that the alternative method proves to be superior and stable in many other cases as well.

Thanks and stay queryious 😉

The post Performance tip for aggregations after joins in Power Query and Power BI appeared first on The BIccountant.


Viewing all articles
Browse latest Browse all 111

Trending Articles