Optimizing data in the model – Optimizing Model Performance

Optimizing data in the model

We’ve covered, in previous chapters, how Power BI supports directly connecting to a data store, such as connecting to an Enterprise Data Warehouse (EDW) that has billions of records and years of sales history. When configured with DirectQuery performance, Power BI becomes dependent upon not only the calculations and rendering of visuals in the report but also the performance of the underlying data store. If the EDW takes a long time to query, then the Power BI report that uses the query will also take a long time.

To help, Power BI also supports importing data, which reduces the performance dependency on the underlying data store to increase the performance of report visuals. Data imported into the Power BI data model is stored both in memory and on disk using the VertiPaq Storage Engine (SE). VertiPaq will compress data, sometimes by as much as 10x, so even with limitations on the amount of data imported, it’s often able to store large datasets.

To best optimize import datasets, we recommend taking a layered approach and reviewing each characteristic of your data, starting with removing unnecessary rows and columns.

Removing unnecessary rows and columns

Often, when connecting to data sources, there will be additional data (both in rows and columns) that is not needed for the reports and dashboards that you want to create. In these cases, you will be best served by removing these rows and columns from the dataset using the Power Query Editor.

Data is stored in the VertiPaq engine in a columnar format, which is highly compressed. However, if it’s possible to remove columns of data, it will successfully reduce the size of your dataset. Due to how data is stored in VertiPaq, it is recommended to remove all unnecessary text columns since they will take up more storage space in the data model. To remove unnecessary columns from a dataset in the Power Query Editor, simply right-click the column to be removed and click Remove. It’s also possible to select multiple columns, right-click, and then select remove columns to remove multiple columns at the same time.

To remove unnecessary rows of data, you’ll need to add a filter to a query. To best execute a filter, you’ll need to understand the basic characteristics of the data, which is easy to do using the data profiling capabilities of Power Query (see Chapter 3, Profiling the Data, for more details). Filtering rows of data is dependent upon the data type of the column being filtered. For example, if you want to filter using the ListPrice column of a Products table, then you’ll need to supply a numeric or currency value to the filter criteria. If you want to filter by a text value (such as removing all products that have a Color value of Red), then you’ll need to apply the filter to a column that is set to a data type of text. To apply a filter to a column, simply click the ▼ button for the column.

For text filters, select Text Filters and then select the type of operator you’d like to use:

  • Equals
  • Does Not Equal
  • Begins With
  • Does Not Begin With
  • Ends With
  • Does Not End With
  • Contains
  • Does Not Contain

For numeric or currency filters, select Number Filters and then select the type of operator you’d like to use:

  • Equals
  • Does Not Equal
  • Greater Than
  • Greater Than Or Equal To
  • Less Than
  • Less Than Or Equal To
  • Between

For date filters, select Date/Time Filters and then select the one needed. Different column data types will have different criteria for applying filters.

Removing unnecessary columns and rows is often a quick and easy way to reduce the amount of data in your Power BI data model. Next, we will look at some more complex ways of reducing the data model size.

Leave a Reply

Your email address will not be published. Required fields are marked *

Shopping cart

close