Quantcast
Channel: Power Query Archives –
Viewing all 111 articles
Browse latest View live

Advanced transformations on multiple columns at once in Power BI and Power Query

$
0
0

You can apply simple transformations to multiple columns at once in Power Query using the UI only. In this article I show how you can apply advanced transformations on multiple columns at once instead. You can also use this to use custom functions instead. And lastly for the lazyefficient fans of custom M-functions: You will get a new “TranformAllMyColumnsAtOnceHowILikeIt”-function as well 😉

Background

The Transform-tab in the query editor is sensitive to the columns you select. So if you select multiple number columns for example, some number transformations will be greyed out and are therefore not accessible:

Some symbols are greyed out, Advanced transformations on multiple columns at once in Power BI and Power Query, Power BI, Power Query, Power BI Desktop

Some symbols are greyed out

So how could I then multiply all my columns by 10 for example, as the symbol for multiplication is greyed out?

Solution

Simply check the columns to transform and select an accessible dummy-function. Ideally, it should contain as many arguments as the intended function, but that’s not mandatory. In our case I choose a function with 2 arguments (for the number and the multiplicator). Rounding -> Round… fits just nicely here:

I enter the number for the multiplicator (10) into the Decimal Places-field.

Number field, Advanced transformations on multiple columns at once in Power BI and Power Query, Power BI, Power Query, Power BI Desktop

Number field

Now – are you asking yourself where to fill in the reference to the number itself? Then check out the M-code that has been generated automatically in the formula bar:

Automatically generated formula, Advanced transformations on multiple columns at once in Power BI and Power Query, Power BI, Power Query, Power BI Desktop

Automatically generated formula

The query editor has set the reference to the number automatically and it is represented by the underscore (“_”). This represents the (only) function argument that is created automatically. Therefore, it used the syntax sugar “each”-keyword.

As you can see, all the code has been created for every field of the table. Therefore, the only thing we have to tweak now is the function itself. I change “each Number.Round(_, 10)” to “each _ * 10” by copy-pasting it into every column expression:

Edited formula, Power BI, Power Query, Power BI Desktop

Edited formula

Using custom functions

A user in the Power BI forum lately asked me on how to apply my “RemoveHtmlTags”-function to his whole table. Therefore he would have to:

  1. Copy the function code from GitHub
  2. Create a blank query in the query editor
  3. Edit that query in the advanced editor and replace all existing code with the copied code
  4. Name that query “fnRemoveHtmlTags”
  5. Now you should check all columns and apply a dummy-transformation
  6. Lastly replace the function-part of the generated code with “fnRemoveHtmlTags” like so:
Edited formula, Advanced transformations on multiple columns at once in Power BI and Power Query, Power BI, Power Query, Power BI Desktop

Remove Html Tags

Are you wondering now where the “each” has gone? Actually, it is not necessary for functions with just one argument. Check this article for example for more details about it.

A function for more efficiency

If you want to apply the transformation to all of your table’s columns, the following function will come in handy. Just fill in 3 parameters (Table, Function and Type). Then at the end you can use the optional “ColumnNames”-parameter. In there you can provide a list of column names if you want to restrict the transformation to those columns only.

You should use the parameters as follows:

  1.  Reference to the table itself
  2.  Reference to the function
  3.  Type of the columns to be transformed (attention: you have to use the proper type (without quotes) and not the textual representation)
  4.  optional parameter: List of column names you want to limit the  the transformation to certain columns

Enjoy and stay queryious 😉

Edit 20th December 2019: Please check out a much smoother version in Cameron Wallace’s comments down below!

The post Advanced transformations on multiple columns at once in Power BI and Power Query appeared first on The BIccountant.


Tidy up Power BI models with the Power BI Cleaner tool

$
0
0

The VertiPaq-Analyzer tool is one of the great community tools that I really cannot live without. It gives me a great overview of all elements in my model and identifies potential performance problems by showing the storage requirements of each column. So when seeing expensive columns, the first question that arises is: “Do I really need this column or could I delete it?”. Luckily, this can now be answered with my new Power BI Cleaner tool. This tool shows the usage of all columns (and measures) within the tables of the VertiPaq Analyzer.

Power BI Cleaner shows unused columns in the VertiPaq-tables

Power BI Cleaner tool

So whenever there is no entry in the column “Where Used” you can go ahead and eliminate the column (or measure) from the model. Well – with one exception actually: Fields used in the definition of incremental load policies are currently not identified. So make sure to consider this before running wild 😉

In the table “Measures” you can spot which measures might not be used:

Identify unused measures with Power BI Cleaner Tool

And as a general recommendation: Always make sure to have your old version still in place and create a new version for your cleaned up file, just in case.

To learn more about the used elements, you can drill through to detail pages “Where Used Direct” and “Where used Indirect”:

Drill through to detail pages

“Where Used Direct” will show where the elements are directly used in:

Analyse directly used elements

And the next page “Where Used Indirect” will show elements that are using the selection indirectly at a later stage as well:

Elements that are indirectly used at a later stage

Please note that the drill through pages will only return values for columns and measures that have been used or referenced by used items (directly or indirectly) . So if a column or measure has been used in another column or measure this will not be shown as long as those elements haven’t been used anywhere in the report. This is intentional and reflects the purpose of the tool to be used as a cleanup-helper and not as a documentation tool.

What’s (not) covered

The areas covered for usage detection are the following:

  • Calculated columns
  • Measures
  • Relationships
  • Filters (on visual, page, report and drillthrough-level)
  • Visuals
  • Groups
  • Roles
  • Conditional formatting

And again – NOT covered are:

  • Incremental load policies

Please let me know if there are areas that I’ve missed and I’ll see if I can include them as well.

Further limitations

  • Columns that are only referenced by their name (without the preceding table name) will NOT be detected.
  • False positives will be generated for measures that have the same name than column names: Unused measures will be flagged as  used, if their column counterparts are used!
  • Some custom visuals produce bad metadata which can lead to errors as well. The Network Navigator for example, renames the input table to “Table1” in the metadata. These fields will not be discoverable through this tool.
  • Column and measure names including brackets “()” will currently not be picked up (this limitation might be lifted in future updates)

How to use

  1. Retrieve vpax-file with DAX-Studio and paste path to vpax-file into parameter “FilePathVPAX”. ( ! This is a preview-feature that you have to enable like described here: https://www.sqlbi.com/blog/marco/2019/09/15/vertipaq-analyzer-2-0-preview-1/ )
  2. Transform pbix to pbit and paste path for pbit into parameter “FilePathPBIT” (This facilitates showing of “WhereUsed”-column in page “Tables”)
  3. Refresh All

Or check out this video for a detailed walkthrough:

 

How does it work

The tool uses the library and functions of the VertiPaq Analyzer and many metadata parsing algorithms of my Power BI Comparer tool. After all I want to thank Marco Russo for the support and confirmation that it is totally OK to adopt their open sourced solutions to other tools.

All the logic of this solution sits in the Power Queries. So do you want to know how this all works? Then just open the query editor and study the queries.

Warning

Custom visuals with bad metadata could lead to error.
Will upload hardened version soon.

Download file:  PowerBICleaner.zip

Enjoy and stay queryious 😉

The post Tidy up Power BI models with the Power BI Cleaner tool appeared first on The BIccountant.

Date.Networkdays function for Power Query and Power BI

$
0
0

Today I’m going to share my custom NETWORKDAYS function for Power Query with you that uses the same syntax than its Excel-equivalent.

NETWORKDAYS function

This function’s first 3 parameters work just like the Excel function and there is a 4th parameter that allows adjusting the day on which the week shall start:

  1. Start as date
  2. End as date
  3. optional holidays as list of dates
  4. optional number to change the start of the week from Monday (default: 1) to any other day (2 would mean that the week starts on Tuesday instead)

The function comes with a UI that lets you first choose a table containing the holidays and then choose the column with the holiday dates.

Date.Networkdays function for Power Query and Power BI

UI for NETWORKDAYS function for Power Query

Date.Networkdays function for Power Query and Power BI

Select date column for NETWORKDAYS function

But you can also type in the list of holidays in manually. Therefore leave the optional parameter blank if you use it through the UI and edit the formula afterwards like so:

fnNETWORKDAYS ( StartDate, EndDate, {#date(2020, 1, 1) {#date(2020,12,25)} ), adding all necessary dates into the 3rd parameters list.

The Code

Twists

If your holidays don’t sit in a dedicated table but in separate column of a calendar table like so:

Holiday as a column within a complete Calendar table

I’d recommend to reference that table, filter only holidays and then reference it’s data column like mentioned before.

Enjoy & stay queryious 😉

The post Date.Networkdays function for Power Query and Power BI appeared first on The BIccountant.

Trimming text with custom characters in Power BI and Power Query

$
0
0

When cleaning dirty data, you might have used the Trim-function (Text.TrimStart or Text.TrimEnd) to delete leading or trailing whitespace from your strings. However, did you know that you could use these functions as well to delete any other characters as well from the start or end of a string? Trimming text with custom characters is pretty straightforward:

Task: Trimming text with custom characters

Say you have a column with values like so

Trimming text with custom characters

and want to delete every number at the end and also every “-” that is directly connected with a number. So that the final output shall look like so:

Trim custom characters at the end of a string.

Optional parameter

By default, you feed just one argument into the Text.TrimStart or Text.TrimEnd function: The string whose whitespace characters shall be removed.

Text.TrimEnd(text as nullable text, optional trim as any) as nullable text

But the second argument lets you define a list of your own characters to be removed respectively. So I can create a list with all the characters that shall be removed from the end like so:

{"0".."9"} & {"-"}

This concatenates 2 lists: The first list contains 10 elements: All numbers as strings. The second list has just one element in it: “-“. I have to put this element into a list as well for being able to use the ampersand (“&”) as an easy concatenator here.

So the full expression for the “Add custom column” dialogue looks like so:

Text.TrimEnd( [MyColumnName], {"0".."9"} & {"-"} )

To see this in action, you can simply paste this code into the advanced editor and follow the steps:

Enjoy and stay queryious 😉

The post Trimming text with custom characters in Power BI and Power Query appeared first on The BIccountant.

Tips to download files from webpages in Power Query and Power BI

$
0
0

When downloading data from the web, it’s often best to grab the data from APIs that are designed for machine-to-machine communication than from the site that’s actually visible on the screen. Not only is the download usually faster, but you also often get more additional parameters that can be very useful. In this article I’m going to show you how to retrieve the relevant URLs for downloading files from webpages (without resorting to external tools like Fiddler) and how to tweak them to your needs.

Retrieving the URL to download files from webpages

Say I want to download historical stock prices from this webpage:

https://finance.yahoo.com/quote/AAPL/history?p=AAPL

The screen will show a link to a download:

Tips to download files from webpages in Power Query and Power BI

Webpage with button to download csv file

If I click on the button, a download dialogue will appear and some browsers will even show me the URL that’s behind it:

Tips to download files from webpages in Power Query and Power BI

Download dialogue with link shown at the bottom of the screen

But when I close the dialogue, the URL will disappear as well. Fortunately in some browsers, you will be able to grab that URL in the options of the link to the downloaded file like in Chrome-based browsers like so:

Grab download link from options under elipsis

But there are other ways as well. Sometimes, a mouse right-click on the download-button reveals a link that takes me to the download link:

But that’s not guaranteed to work everywhere as well. The last resort for me is to inspect the element: Rightclick the download link and choose “Inspect” (or “Inspect element”) instead:

This opens up the full monty of your site. You should then be able to find the URL near the highlighted position (indicating the element that you’re inspecting):

Tips to download files from webpages in Power Query and Power BI

Grab URL to download files from webpages

Tweaking the URL

Now let’s examine the catch and see how we can exploit it:

https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1504461150&period2=1586083550&interval=1d&events=history

The first part up until the “?” is the main query, but after the question mark we see 4 query parameters:
Start- and Endate, interval and events. In this case, they correspond to the options on the webpage itself. Playing around with the parameters and checking the resulting URLs reveal that “wk” can be used to retrieve weekly data and “mo” is the abbreviation for monthly data. Leaves the question how to decipher the parameters for start- and endate.

As it turns out, they are noted as Unix timestamps . They represent a timestamps as the number of seconds after the start of 1st January 1970. So to transform a date to it, one has to:

  1. Determine its distance to 1/1/1970: Subtract #duration(25569,0,0,0) and then
  2. Convert it to seconds: * 86400

This is what the formula would look like:

Number.From ( DateTime.From ( DateTimeInput )  - #duration ( 25569,0,0,0 ) )  * 86400

Now you’re able to determine the interval for the download dynamically from whatever the refresh of your data reveals. Hope you found this useful.

BTW: The formula to calculate it the other way around looks like so:

#datetime ( 1970,1,1,0,0,0 ) +  Duration.From ( UnixTimestamp / 86400 )

Closing with a general hint on how you have to adjust the queries if they shall be refreshable in the Power BI service: http://blog.datainspirations.com/2018/02/17/dynamic-web-contents-and-power-bi-refresh-errors/

Enjoy & stay healthy & queryious 😉

The post Tips to download files from webpages in Power Query and Power BI appeared first on The BIccountant.

Power BI “Store datasets in enhanced metadata format” warning

$
0
0

This is just a quick heads up for the new preview feature “Store datasets in enhanced metadata format“. You should definitely think twice before turning this feature on:

Background

With the march release came function “Store datasets in enhanced metadata format”. With this feature turned on, Power BI data models will be stored in the same format than Analysis Services Tabular models. This means that they inherit the same amazing options, that this open-platform connectivity enables.

Limitations and their consequences

But with the current setup, you could end up with a non-working file which you would have to build up from scratch for many parts. So make sure to fully read the documentation . Now!

In there you find this warning:Store datasets in enhanced metadata format

Warning for the new enhanced metadata formatSo once you’ve stored a file with this feature turned on, there is no way to open this file in an older version of PBI Desktop where this feature is not available. So if you exchange your files with colleagues or clients who might use older versions, you should not turn this feature on.

I can hear you

You: “Well, that’s no problem for me, we’re all running on the latest version. So I’m safe to turn it on for all my files.”

Me: “So you either haven’t read until the end of the documentation, or might not have understood its consequences. At the end of the article you’ll find a list of further limitations that will not work with the new format. So if you open an existing file with the old metadata format containing one of those features, the conversion will fail and the old format will be used instead.”

You: “Well, that sounds like a good solution. Nothing can go wrong then. Why the drama, Imke?”

Me: “Imagine you work on a file that has been successfully converted and make a lot of changes to it. But after a while you realize that you need to add one of the features that are on the limitations-list. Oops… ”

So if you’re lucky, you can revert to the latest version with the old metadata format and redo your work from there. But in the worst case scenario, you have to do everything from scratch.

Well, not everything: You can clean copy the queries to a new file with just a couple of clicks (see here for example). (Hint: You don’t even need to select each folder separately. If you want to copy all queries to the other file, just select the first, hold Shift-key, select the last query and copy.)

The Tabular Editor can help you to quickly transform the work on your data model to the new file. (Hint: There is an awesome video series about it here)

Then you’d have to copy all your visuals one by one and redo all other settings manually.

Bug warning

On top of that, you’re currently running the risk to catch this bug. For me that meant that I couldn’t load any of my queries any more.

So currently I only recommend using this feature in a very controlled way and would personally not use it on files which might require further development.

Take care and stay queryious 😉

The post Power BI “Store datasets in enhanced metadata format” warning appeared first on The BIccountant.

Create a load history or stage in CDS instead of incremental load in Power BI

$
0
0

If you’ve been following my blog for a while, you might have noticed my interest in incremental load workarounds. It took some time before we saw the native functionality for it in Power BI and it was first released for premium workspaces only. Fortunately, we now have it for shared workspaces / pro licenses as well and it is a real live saver for scenarios where the refresh speed is an issue.

However, there is a second use case for incremental refresh scenarios that is not covered ideally with the current implementation. This is where the aim is to harvest and store data in Power BI that will become unavailable in their source in the future or one simply wants to create a track of changes in a data source. Chris Webb has beaten me to this article here and describes in great detail how that setup works. He also mentions that this is not a recommended setup, which I agree. Another disadvantage of that solution is that this harvested data is only available as a shared dataset instead of a “simple” table. This limits the use cases and might force you to set up these incremental refreshes in multiple datasets.

Dataflows to the rescue

Recently I discovered a nice alternative that allows you to use Power Query for such a task. It stores the data in an entity of the Common Data Service (short: CDS). From there it can be queried by the Power Platform applications (unfortunately there is currently no connector for it in Power Query in Excel). Please note that this requires appropriate licensing.

Please also note, that currently not all Power Query features are available in Dataflows. Most notable limitations are missing connectors and custom functions not being available. But as Microsoft seems to aim for feature parity, this looks like a very promising path.

The technique uses a Power BI dataflow to regularly import the data from the source and (if necessary) compares it to the data already stored in the history table (sitting in CDS). It filters the new data that shall be added to the existing data. To transfer this data into the CDS, a Power Platform dataflow has to be used. Power BI dataflows currently cannot store data into CDS entities. So the Power Platform dataflow does nothing else than re-formatting and forwarding the content from the Power BI dataflow into the CDS entity. Reformatting is necessary due to the following limitations:

  1. Using data fields in Power BI dataflows will make that dataflow invisible from a Power Platform dataflow. So format your date fields as text and reformat them back to date in the Power Platform dataflow.
  2. Datetime fields will cause troubles when loading to the common data service entity. So either split them to a date and a time field or use them as a text field, if possible, instead.

Process overview

Overview of the incremental load scenario with Power Platform dataflows

Although I haven’t tested it, this should also be possible when your bring your own data lake storage.

Edit: As Maxim Zelensky has pointed out, reading all the data from the CDS entity to perform the comparison could slow the refresh process down considerably. It would be very cool, if one could use entity metadata to read/write the parameters needed for the delta load (like latest date for example). But for now, one would instead have to create a “Shadow entity” that stores just this data in a one row table that could be read much faster like so:

With “Shadow entity” containing metadata for delta load

Dataflow settings

I won’t go into the details on how to create dataflows here. Matthew Roche has you covered here , if they are new to you. Just want to point out some notable things:

Appending data to entities in the Common Data Service with dataflows

If you want to append new data to already existing data in the entities, you have to leave the “Delete rows that no longer exist in the query output” unchecked. Otherwise all your data will be deleted !!

To do this, you need to have a unique key in your table. But often this won’t be provided by your data source. Luckily there is an easy way to do this in the Power BI dataflow:

  1. Create a new column that contains “DateTime.LocalNow()” (this will return the current timestamp)
  2. Create an index column (this will create a unique number for each row in the refresh)
  3. Merge both new columns (this will create a value that will be unique in the history table)

Lastly, make sure to sync refresh time of both dataflows so that the Power BI dataflow refresh will be finished before the refresh of the Power Platform dataflow starts.

Enjoy and stay queryious 😉

If you need professional help to set this up in your environment, just send an inquiry to info@thebiccoutant.com.

The post Create a load history or stage in CDS instead of incremental load in Power BI appeared first on The BIccountant.

Automatically detect and change the types of all columns at once in Power Query

$
0
0

Today I want to share quick tip on how to automatically detect and change all column types at once in Power Query.

Background

Very often, when you expand a column in Power Query that contains a table or some records, the expanded columns will lose their types (like Chris Webb has described here for example). Or you might just have accidently deleted a “Changed Type”-step.

change all column types at once

No types on columns

Did you know there is actually a superfast and easy way to do it?

  1. Click the mouse anywhere in the table
  2. Press Ctrl + a (check all)
change the types of all columns

Check the whole table with Ctrl + a

  1. Go to the Transform-tab ad choose: “Detect Data Type”
change all column types at once

Transform with 1 click

Voila: All your columns should have types on them.

They have been automatically been detected by checking the first 100 rows of your table. So if you know that you’re having columns with inconsistent values in them, make sure to check the automatically assigned values.

Enjoy & stay queryious 😉

The post Automatically detect and change the types of all columns at once in Power Query appeared first on The BIccountant.


How not to miss the last page when paging with Power BI and Power Query

$
0
0

When you use an API with a paging mechanism (like the example from this blogpost), you’ll might work with a field that contains the address for the next page. You can use this to walk through the available chunks until you reach the last element. That last element in the pagination will not contain a next-field or that field will be null.

Paging in Power Query

In Power Query you can use the function List.Generate for it. According the latest function documentation it:

Generates a list of values given four functions that generate the initial value initial, test against a condition condition, and if successful select the result and generate the next value next.

So an intuitive implementation would look like so:

paging pagination Power Query

Initial code for paging: Will miss the last element

In the initial step (row 2) the API will be called and returns this record:

paging pagination Power Query

Examining the result of the first call

So for the upcoming iterations (next in row 4), a reference to the field next will be made and this URL will be called.

In the condition (row 3) I say that this process shall be repeated until the next-field of my previous result ([Result]) is empty.

However, this will only return 14 list with 20 elements each, missing the last element with 13 items to retrieve the full 293 items.

Let’s check it out:

Last Element (13 rows) is missing

Solution

Honestly, I still find it difficult to understand, why this last element is missing. But fortunately there is an easy solution:

paging pagination Power Query

Split into 2 steps and reference previous URL instead

The trick lies in the adjusted condition (row 4): Instead of checking if there is a next-field in the previous record, I check if the previous record had a URL to call. That basically reaches 1 level further back and will deliver the full results.

Alternative

Actually, you can also use some “brute force” using a try – otherwise – statement like so:

Simple alternative

But this will not deliver any items for debugging if something in the calls goes wrong. So I prefer not to use try statements for looping or pagination.

Enjoy and stay queryious 😉

The post How not to miss the last page when paging with Power BI and Power Query appeared first on The BIccountant.

Performance tip to speed up slow pivot operations in Power Query and Power BI

$
0
0

Pivot operations in are a very handy feature in  Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Trick to speed up a slow pivot table

Don’t use an aggregation function when you want fast pivoting:

slow pivot

Don’t aggregate when you want a fast pivot in Power Query

But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:

Error when the values are not adequately aggregated

So to make this work, you have to aggregate the values on the axis’ values before.

Let’s walk through the steps:

Walkthrough

Start is this table:

Start

The pivot shall bring the values from the “Column”-column into the column-area and sum the values from column AB like so:

Result

If I pivot without an aggregation like mentioned above I will get the dreaded error-message like above, because there are multiple rows for each Row- and Column-combination:

slow pivot power query performance

Values are not aggregated on the row- and column axis

The step to success is a grouping operation beforehand:

slow pivot power query performance

Group on all columns that shall define the row- & column values of the pivot

This returns a table with unique row- & column – combinations:

Aggregated table with just as many rows as the number of fields in the desired pivot table

9 rows for a desired 3×3-matrix looks just about right. So if I pivot here, there will be no further aggregation needed and the desired result will be shown.

Who found it?

Genius Bill Szysz used this method in his code to speed up matrix multiplication:

slow pivot power query performance

Code from Bill Szysz for a fast matrix multiplication (posted by DataChant)

This article is almost 4 years old, and I’ve even played around with the code at that time. Sight.. 4 years wasted time where I didn’t realize that the key for the performance improvement lied in a technique that would significantly improve the refresh speed of so many other applications as well…

Why does it work?

Here is my guess:

It looks as if the group operation creates some primary keys that create partitions for every row (or even every cell?) of the pivot table to be. I tested this guess by adding a primary key on those 2 columns (instead of grouping) and the refresh time sped up just like with the group operation. So if your data is already aggregated to the right level you can just add a key (or remove duplicates – as long as you don’t loose any rows), no need to do the group.

This means that the pivot operation doesn’t have to work on the full table, but just on the partitioned parts. (In this article I have described the performance improvements through partitions the first time).

But this also reminds me of the performance improvement for aggregations after joins, that I’ve blogged about here. Let’s see if there will be more use cases to be found.

Enjoy and stay queryious 😉

The post Performance tip to speed up slow pivot operations in Power Query and Power BI appeared first on The BIccountant.

Extract pattern string and numbers from text using List.Accumulate in Power Query

$
0
0

A typical task when cleaning data is to extract substrings from string that follow a certain pattern. In this post I’m going to describe a method that uses the List.Accumulate function for it. Extract a pattern string.

Task

I have to extract a payroll key from a description field that starts with 8 number, followed by an “-” and another number.

aölsfdk0125-fds  da12345678-0asdf

So I’m after the 12345678-0.

Plan

I plan to approach this by

  1. stepping through the string and check each character if it is valid.
  2. If so, store it
  3. and if not, forget it. And in that case, also forget the stored values so far so the collection starts from scratch.
  4. Then if a series of matches builds up, I will have to check the count of the stored values to not exceed the length of my target pattern.
  5. Once the length is reached, no further checks shall be performed and the found values be returned.

My aim is to find a generic way so that this solution can be adapted to many other similar use cases as well. Therefore the pattern must be described in a scalable way and the identification of the pattern elements should be easy to compute.

Execution

First comes defining the pattern structure. Therefore I create a placeholder string, that holds one placeholder symbol for each kind of valid values:

xxxxxxxxyx

This reads as: For an x, any number between 0 and 9 is valid and for a y, only a “-” is allowed.

Therefore I create a table (“ValidValues”) with valid values for each kind of position:

Extract a pattern string

Valid values are organized as placeholders

Then I create another table (“Position”) where I define the pattern and match these values to the actual pattern:

Extract a pattern string

Pattern definition with placeholders from valid values

So for each position in the target string, I define the placeholder that identifies the valid value(s) from the first table. This is the key that I now use to merge that ValidValues-table to my positions table:

Merge the allowed values to the pattern table via placeholder key

When I expand out the “ValidValues”-column I get a long table with all valid values for each position in the pattern:

Expanding returns all valid values for each position in the pattern

This is a structure that lets me easily grab the valid values for each position with the following syntax:

Table.SelectRows(Positions, each [Position] = <NumberI’mAfter>) [ValidValue]

The blue part selects all rows which contain the number that I’m after and the green part selects the last column and transforms it to a list that is easy digestible for further computation. I’m going to use this formula in the List.Accumulate operation later on.

Now I start preparations to walk through the string that contains my pattern. Therefore I turn the text into a list:

Text.ToList(“aölsfdk0125-fds  da12345678-0asdf aölsdfj”)

Tranform text to list

List.Accumulate

Then I use List.Accumulate to step through this list, check each element and perform the task outlined in my plan above:

List.Accumulate to step through the values and create a list of matching characters

BTW: This code has been formatted with the great new Power Query Formatter tool .

The following picture contains the description of what each step does:

Commented code

After the ListAccumulate, I select the field “Result” (in step “Result”). It contains the list with matching strings which I then combine in the last step.

Please check the enclosed file to see it in action:  RegexDummy1_.zip

If you’ve enjoyed this tutorial so far, please stay tuned for the next episode. There I will transform the query to a function that can be applied to all rows of a table and adjust it to make the last 2 characters optional, so that also strings with just 8 numbers in them can be found.

Enjoy & stay queryious 😉

The post Extract pattern string and numbers from text using List.Accumulate in Power Query appeared first on The BIccountant.

Transform a query into a function in Power Query and Power BI

$
0
0

In my previous blogpost I’ve described a method how to extract a substring that follows a certain pattern from a string. In this post I show how to transform a query into a function that can be applied to many rows of a table.

Video how to transform a query into a function

Please check the video for detailed steps. In there I also show how to modify the code. It shall also detect strings with a sequence of just 8 numbers. In the original query, those had to be followed by a minus sign and another number:

The steps in pictures

Copy the hardcoded string that shall be replaced by a function parameter.

Create a new parameter.

Paste copied string as as a default parameter in into the “Current Value” field.

Replace hardcoded string in query by a reference to the parameter.

Check query, rightclick mouse and choose “Create Function”.

Name the function. This will transform the query into a function.

Then add a column by invoking the function.

As you might have recognized, a new folder will be created where the original query and the function parameter(s) are collected. Also, you’ll find the newly created function in it.

The function is “connected” to the original query. That means that all changes that I make in the original query will automatically be transferred to the function as well. This makes adjusting the function or troubleshooting it so much easier.

Adjustment

The original request I got was to make sure that also 8-digit-number strings without the trailing “-x” shall be found as well. Therefore I made an adjustment in row 16. There it will be checked 8th position of the so far collected string is reached. If so, store it.

Sample file

You can download the file with the code here:  RegexDummy_Part2.zip

Enjoy & stay queryious 😉

The post Transform a query into a function in Power Query and Power BI appeared first on The BIccountant.

Retrieve header fields like response status from Web.Contents in Power BI and Power Query

$
0
0

Many Power Query function not only return their values as advertised in their function documentation, but on top of that a metadata record. This record is like tag that holds additional information about the returned main value (for more details about this, please check out my friend Lars Schreiber’s article about it).

Useful metadata for the Web.Contents function

Today I discovered that the function Web.Contents delivers a really nice record with a couple of useful information. To retrieve header fields, you have to use the Value.Metadata function, like so for example:

Return header fields like response status from Web.Contents

Interesting metadata from the Web.Contents – function

This might help for some advanced web query tasks.

How to use

If you want to use this in production, you’d probably branch out the logic. So first use Web.Contents and keep that result in a column or variable. Then add another column that references it and return the metadata record.
Apply the logic check on it and create a last column where you finally parse the content from the binary that Web.Content has returned.

Enjoy & stay queryious 😉

The post Retrieve header fields like response status from Web.Contents in Power BI and Power Query appeared first on The BIccountant.

Convert DateTime to ISO 8601 date and time strings in Power Query

$
0
0

Often, when querying APIs it is required to enter date and time filters in ISO 8601 format . Today I show a quick way to convert DateTime to ISO 8601 string, based on an ordinary DateTime field according to the following pattern:

2020-10-11T15:00:00-01:00

This represents the 11th October 3pm in UTC -1 timeszone.

Steps to convert DateTime to ISO 8601

If I enter:

#datetime(2020,10,11,12,0,0)

into the formula bar, it will be converted to :

11/10/2020 12:00:00

Comparing to the desired ISO format the year, month and days are in the wrong order. So using the universal Text.From function will not return the correct result.

Fortunately, there are a couple of xxx.ToText function in Power Query that allow for dedicated formatting parameters in their 2nd arguments. For example, the function DateTime.ToText can actually return the string in the desired format if you pass a format string as the 2nd parameter:

Convert DateTime to ISO 8601

DateTime.ToText with format parameters

The syntax for these format strings can also be found here.

Last step is to add the time-zone string ( & “-01:00”), as I’ve started from a DateTime value only:

DateTime.ToText(DateTime.From(#datetime(2020,10,11,12,0,0)), "yyyy-MM-ddThh:mm:ss") & "-01:00"

Enjoy & stay queryious 😉

The post Convert DateTime to ISO 8601 date and time strings in Power Query appeared first on The BIccountant.

How to refresh Power Queries on protected sheets in Excel

$
0
0

When working with Power Query in Excel you might want to refresh Power Queries on protected sheets. But this will not work by default. Using a macro to temporarily unprotect the sheet and protect it again will do the trick. But this requires the password being displayed in the VBA code. So please have in mind that this technique only works for scenarios where you want to prevent accidental changes with the password protection.

Steps to refresh Power Queries on protected sheets

The following VBA code will unprotect the sheet “mySheet”, then refresh the query “myQuery” before protecting the sheet again with the password “myPassword”.

Sub RefreshmyQuery()
Sheets("mySheet").Unprotect Password:="myPassword"
ActiveWorkbook.Connections("Query - myQuery").Refresh
Sheets("mySheet").Protect Password:="myPassword"
End Sub

But if you use it as it is, you’ll receive the following error message:

The re-protection of the worksheet will kick in sooner than the refresh could finish.

To overcome this, you have to disable background refresh of your Power Query (“myQuery”). This can be done via the properties like so:

Disable background refresh then:

That’s it. Refresh will succeed.

Enjoy and stay queryious 😉

The post How to refresh Power Queries on protected sheets in Excel appeared first on The BIccountant.


Your Oracle data import in Power BI and Power Query is slow?

$
0
0

If you’re using the native Oracle connector in Power Query, you will probably experience a very slow import performance. Thanks to Tristan Malherbe for recommending to use the OleDB-connector in Power Query instead. This speeds up import enormously.

How to create the connection string

If you’re using the OleDb.DataSource connector instead, you have to pass a connection string as the first parameter and an optional query record as the second parameter. To speed it up even more, you should use a FetchSize parameter in the connection string. For me, this didn’t work when I pasted it into the popup-window. So I had to manually add it in the query editor:

OleDB connection to an Oracle database

The “:1521” in the connection string is the port number, which is usually 1521 for Oracle databases.

You can play around with the FetchSize to determine the ideal value for your specific use case.
For security reasons, make sure NOT to pass your credentials in the connection string (as mentioned in the link above), but to pass them in the credentials section instead (that’s the 2nd dialogue in the import process)

Query speed had been improved by order of magnitude for me:

Import speed difference

Enjoy and stay queryious 😉

The post Your Oracle data import in Power BI and Power Query is slow? appeared first on The BIccountant.

Clean up or harmonize mis- or differently spelled category data with Power Query

$
0
0

A typical problem with data that has been created by manual entries is that category values are often misspelled or missed. So in this article I’m showing a very powerful technique on how to deal with this problem to clean up dirty category data. It was inspired by the “Preppin’ data” challenge whose instructions you can read here.

Task

Categorize dirty data

 

Solution

Create or import a table with all allowed category values:

Categories table

Merge that table to the column with dirty data and “Use Fuzzy matching to perform the merge”.

Fuzzy merge to match valid categories

This will activate some very clever AI algorithms that check for similarities between the dirty data and the allowed categories.

There are a couple of options that you can use to fine-tune the matching algorithm. But for the current challenge at hand, the default settings just work fine and return correct matches for all rows.

 Video

If you want to see how this works in detail, please check out the video.

Enjoy and stay queryious 😉

The post Clean up or harmonize mis- or differently spelled category data with Power Query appeared first on The BIccountant.

Extract only letters from a mixed string in Power Query and Power BI

$
0
0

This is a quick method about extracting only letters from a string. It is part of the Week2 “Preppin’ data” challenge.

Task for extracting letters from a string

Image you have a string like so: “10.ROADBIKES.423/01” and would like to extract only “ROADBIKES”.
Power Query actually has a function for this purpose: Text.Select. It takes 2 parameters:

  1. The text to select from
  2. A list of characters that shall be selected

For the given example the code would look like so:

Text.Select( "10.ROADBIKES.423/01", {"A".."Z"})

This function is always case sensitive as there is no optional parameter that accepts a comparer function.

Easy application

Although the function is not available through the UI, it can nonetheless easily be applicated. Just use a different dummy-text transformation function and then edit the code afterwards. That way you only have to type in a tiny fraction of the code:

Tweak to Text.Select for extracting letters from a string

 

Please check the video if you want to see how to use this function without having to manually code it in the advanced editor. You’ll also learn how to apply aggregations on groupings easily:

 

Enjoy and stay queryious 😉

The post Extract only letters from a mixed string in Power Query and Power BI appeared first on The BIccountant.

Power BI Cleaner now fully covers Calculation Groups

$
0
0

You can now download a new version of my Power BI Cleaner tool that finally covers usage of DAX expressions in Calculation Groups as well. For an introduction into this tool and further limitations, please check out this post. There is also a nice article from Matt Allington covering some additional aspects.

Calculation Group coverage

Until today, you would only see which calculation group has been used in your report, but not which DAX expression (measure or column) has been used to create it. But with the new version (V11) you will not run into the risk anymore to delete a measure for example, that has been used to create a calculation group. Their usage will be covered under the category “measures and columns”.

In the file attached, you can see that the column “Net Price” has been used in “MeasuresAndColumns”:

When you drill through “Where Used Indirect” you can identify the usage in the calculation group:

You can download the file (V11) here:  PowerBICleanerV11_upload.zip   (After download, rename the xxx.zip to xxx.pbix to open the file in PBI Desktop)
Current Version: _2 (2020/03/15): Includes bugfix for filters.

Enjoy & stay queryious 😉

The post Power BI Cleaner now fully covers Calculation Groups appeared first on The BIccountant.

Fix error “..prod.powerquery.microsoft.com refused to connect” in Power BI dataflows

$
0
0

Today I could not edit any dataflows for a client and was constantly getting “… prod.powerquery.microsoft.com refused to connect”:

powerquery refused to connect

… prod.powrquery.microsoft.com refused to connect

 

Twitterverse to the rescue

After a couple of unsuccessful trials I went to the twitterverse to ask around if anyone else was experiencing the same (thinking that there was an issue with the service in general). Thankfully Dan Szepesi had the fix to my problem, which was actually a bug. His solution was to replace the “www”-part of the URL by “app”. This is his description:

Issue:
When browsing powerbi.com and trying to edit workflows, if the URL in your browser (host header) is powerbi.com, you will get an error when editing dataflows. This will fail in IE, Edge, Chrome and FireFox.

Cause:
Content Security Policy is applied to the page us.prod.powerquery.microsoft.com and powerbi.com is NOT in the frame-ancestors response field for pages that can embed us.prod.powerquery.com

Workaround:
Manually change URL in browser to app.powerbi.com

Fix:
Add https://*.powerbi./com or powerbi.com to the frame-ancestors list server side.”

I was prompted to sign in again, but then I was finally able to edit my customer’s dataflow. Big thanks to Dan and the Power BI community on Twitter for that!

Technical background

Dan also provided some additional information about this bug, that I’m posting here as it is (it is well above my head 😉 ):

“When you select ‘Edit Dataflow’, it makes a call to us.prod.powerquery.microsoft.com. The page will embed what comes back into the main frame of the page.

In the response headers to this GET is the following:

Breaking out the Frame-Ancestors part: frame-ancestors ‘self’ adf.azure.com
ms-adf.azure.com
https://.ci.ai.dynamics.com
https://
.api.ci.ai.dynamics.com
msit.powerbi.com
app.powerbi.com
admin.powerplatform.microsoft.com
dataintegrator.trafficmanager.net
admin.powerapps.com
admin.flow.microsoft.com
https://.admin.powerapps.com
https://
.admin.flow.microsoft.com
https://.businessplatform.microsoft.com
admin.microsoft.com
us.flow.microsoft.com
https://.crm.dynamics.com
https://.iom.d365.dynamics.com
teams.microsoft.com
teams.powerbi.com
web.powerapps.com
https://.web.powerapps.com
make.preprod.powerapps.com
make.preview.powerapps.com
make.powerapps.com
scv.dynamics.com
scv.azureedge.net
web.azuresynapse.net
ms.web.azuresynapse.net;

NOTICE:
powerbi.com is not in the frame-ancestors list so the browser will not allow that page to be embedded.”

Hope you found this post soon enough.

Enjoy and stay queryious 😉

The post Fix error “..prod.powerquery.microsoft.com refused to connect” in Power BI dataflows appeared first on The BIccountant.

Viewing all 111 articles
Browse latest View live


Latest Images