Quantcast
Channel: Power Query Archives –
Viewing all articles
Browse latest Browse all 111

Multiple replacements or translations in Power BI and Power Query

$
0
0

A common task when cleaning data is to perform multiple replacements at once. When dealing with “translation tables” you might have come across Chris Webb’s solution using List.Generate to deal with this. This works fine – but might deliver unexpected results if you don’t consider it’s implications properly. Say we add lines like shown below, then we might expect the result as highlighted in yellow:

Image2

Multiple replacements using 2 different techniques

So hey – what happened here?: Our catamaran turns out as a bearamaran and the chair turns to chwater.

How it works:

Because there could be more than one word per line that needs to be replaced, the function Text.Replace alone wouldn’t be able to do this job as it takes only one replacement element. This is where List.Generate jumps in: Taking the result of the previous replacement step as input and performs the next replacement – as often as there are elements in the replacement list.

And Text.Replace is a function that searches for substrings in the text. That way “catamaran” turns out as a “bearamaran”.

But what happened to the mat where the cat and the dog sat on: “chwater”?

Firstly “mat” has been replaced by “chair”. But due to the recursive nature of the List.Generate-function, this result is not safe. So when Text.Replace kicks in on: Replace “air” with “water”, searching again also for substrings, a part of “chair” with be replaced by water: “chwater“.

So while Chris’ article is an excellent explanation of how List.Generate works, most multiple text replacement requirements in the real world might be better solved with a different approach:

Use List.ReplaceMatchingItems instead

This is a function that is designed to work on lists and performing replacements for multiple items (that also need to be fed in as list):

But we want to replace within a text, not within a list!

No problem for Power Query as the master of transformations: We just temporarily transform our text into a list, perform the replacements and then transform our list back into text – as if nothing has happened 🙂

Text.Split(text, ” “) will split out text strings into a list of single words because we choose blank as the delimiter

Text.Combine(list, ” “) will then reassemble.

This technique is useful in many other operations as well. Sometimes the delimiters will change, but as long as you don’t get too mad with your transformations in between, it is often a easy and safe way to take a useful function from the List-category to perform on Text-fields.

But there’s another hurdle to take here: How to get our translation table into the desired format for this function? Have a look at the documentation:

List.ReplaceMatchingItems ( {1, 2, 3, 4, 5}, { {2, -2}, {3, -3} } ) equals { 1, -2, -3, 4, 5}

So we’re fine with lists: Identifying them by their curly brackets like in the result: { 1, -2, -3, 4, 5}

But what the x is this?: { {2, -2}, {3, -3} } a list containing lists as it’s elements.

List of Lists

Have you ever worked with a table with one column only? Maybe also have been confronted with a surprising effect that this had automatically turned itself into list-format? So if our table just had one column, it would almost be a list already. So let’s try to get there: Merge the columns of the translation table into one so that the contents of the rows (each cell effectively) will be returned as a list (as we want to maintain the separation that the columns gave, but not their format).

Table.AddColumn(Source, “Liste”, each ( { [Word To Replace], [Replace With] } ) )

This is an added column where in each row of the table, the specific rows contents of [Word To Replace] and [Replace With] will be passed into the list. So for every single row one list will be created.

If you’re interested in the techniques behind this or have guessed that instead one list with 2 full column would be returned: [Word To Replace] and [Replace With]: The unimposing little word “each” actually creates a sort of a row context. Actually the expression at the 3rd place in the “Table.AddColumn”-expression is expected to be a function and that’s what “each” stands for. It is basically syntax sugar for a full function expression like described here.

So: Goodbye columns “WordsToReplace” and “ReplaceWith”, but we respect your destiny and maintain your separating character by keeping your contents separated in the list-format instead.

Nice side-effect: It’s also way much faster.

And here is the full code:

let
    Source = ReplacementsTable,
    CreateListOfLists = Table.AddColumn(Source, "Liste", each ({[Word To Replace], [Replace With]})),
    Text = Text,
    TurnTextToList = Table.AddColumn(Text, "Custom", each Text.Split([Text], " ")),
    Replacements = Table.AddColumn(TurnTextToList, "Changed Text Expected", 
                each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists[Liste])," ")),
    Cleanup = Table.RemoveColumns(Replacements,{"Custom", "Text"})
in
    Cleanup

& the file: Multiple replacements or translations in Power BI and Power Query

Enjoy & stay queryious 🙂

The post Multiple replacements or translations in Power BI and Power Query appeared first on The BIccountant.


Viewing all articles
Browse latest Browse all 111

Latest Images

Trending Articles



Latest Images