跳至正文

【DAX译文】理解 DAX上下文转换Understanding context transition in DAX

上期译文: [DAX译作]DAX行上下文和筛选上下文 / Row Context and Filter Context in DAX

喜乐君注:上下文转换是DAX中的难点,逻辑上看似很简单,但考虑到函数的复杂性和嵌套,实际情况又千变万化,即使专业用户也敬畏三分。本文是sqlbi两位大神的官方文档,学习过程中翻译如下,仅供学习参考。


Let us state this from the very beginning: context transition is a simple concept. It is a powerful feature aiming to simplify the authoring of DAX code. That said, most new DAX developers find context transition hard to understand, and they consider it to be the major reason behind incorrect results. There are two reasons for developers to feel this way:

首先我们要强调的是:上下文转换是一个简单概念,它是聚焦简化DAX代码的强大功能。很多人DAX新手以为它难以掌握,并把很多错误归咎于它。主要有两个方面的原因:

  • 清晰地理解行上下文和筛选上下文之间的区别,是理解和掌握上下文转换的关键前提。 A solid understanding of the difference between the row context and the filter context is an important prerequisite to understand and master the concept of context transition.
  • 牢记“上下文转换”是在何时、并如何运作的。大多数相关错误都来自于开发者忘记考虑“上下文转换”这一要素,并非不知道它如何运作。一旦开发者意识到“上下文转换”是考量因素,理解代码也就顺理成章。You need to remember when and how the context transition works. Most errors involving context transition are due to the developer forgetting to take the context transition into account, rather than not knowing how it works. Once they realize that context transition is happening, the code suddenly makes sense.

这两点其实不是“上下文转换”的特征,而是适用于DAX的每一个方面。DAX需要你掌握细节并理解基础。These two factors are not a peculiarity(特点、特征) of the context transition. They are relevant in almost every aspect of DAX. DAX requires you to remember details and understand the fundamentals well.

如果你对行上下文”“筛选上下文”还缺乏了解,我们建议你先阅读并理解前面的两篇文章(点击前面高亮部分即可)。If you are not familiar with the row context and the filter context, we suggest you read and understand really well the two previous articles:

不管怎样,下面的简单介绍对于你理解“上下文转换”都至关重要。Nonetheless, this short introduction is important to understand the definition of context transition. Here is its definition:

Calculate表达式激活“上下文转换”。
“上下文转换”把所有存在的”行上下文”转换为等价的“筛选上下文”,之后被Calculate用于其第一个参数值的计算/计值。

CALCULATE triggers context transition.
Context transition transforms any existing row context into an equivalent filter context, which is then used by CALCULATE to evaluate its first argument.

SQLBI

如果你还不了解row context和filter context 以及它们的区别和作用过程,那上述说明对你而言可能不过尔尔。 实际上,“上下文转换”就是“行上下文”转换为“筛选上下文”的过程——我们必须先熟悉它们二者差异,只有这样才能明白从一方转换为另一方时发生了什么。
If you are not familiar with the row context and the filter context, their differences and their behaviors, then the previous definition does not mean much. Indeed, the context transition is the transformation of a row context into a filter context. This is why you need to know the differences between the two evaluation contexts very well, to better appreciate what the transformation of one into the other implies.

“行上下文”中的CALCULATE或CALCULATETALBE表达式执行时,“上下文转换”就会发生。 有时候这很容易发现,有时候转换的两个前提条件又相当隐秘。 这里稍作展开:
Context transition happens when CALCULATE (or CALCULATETABLE) is executed inside a row context. Sometimes this is easy to spot, whereas sometimes one of the two prerequisites for the context transition is somewhat hidden and harder to see. Let us elaborate a bit:

  • 第一,“上下文转换”存在于CALCULATE条件下。 如果你的代码中没有CALCULATE,那就不会出现转换。 注意的是,只要使用度量值(Measure),CALCULATE就必然出现;因此,引用度量值就会激活“上下文转换”,即便没有显性地使用CALCULATE表达式。
    First, context transition is invoked by CALCULATE. If CALCULATE is not present in your code, then there is no context transition. Be mindful that CALCULATE is always present when you invoke a measure; therefore a context transition might occur when you call a measure, without any explicit CALCULATE in your code.
  • 第二,上下文转换需要存在“行上下文”,如果没有“行上下文”,那转换也不会发生,因为缺乏了转换的对象。行上下文可以是在引用迭代器中显性地出现,也可能在创建“计算列”时自动出现——计算列中默认就有行上下文。
    Second, context transition requires the presence of a row context. If there is no row context, then no context transition happens, because there would be no row context to transform. The row context can be explicitly introduced with an iterator, but it might also be present because you are writing code in a calculated column – where by default, there is a row context

1)

接下来,介绍一个显性转换,一个隐形转换的例子。
首先,我们看一下“大客户销售额”的例子,这里“度量值”计算自动激活了“上下文转换”。
Let us show two examples where the context transition is entirely explicit, or entirely implicit. First, let us look at an example where the presence of context transition is evident with a measure that computes the sales to big customers:

【函数】Sales数据表中的度量值 Measure in Sales table

Sales to big customers :=   -- 大客户的销售额
SUMX (
    VALUES ( Customer[CustomerKey] ),    -- 针对 客户编码字段,投影、去重生成数据表
    VAR CustomerSales =  
        CALCULATE (                      -- 使用Calculate表达式计算:
            SUMX ( Sales, Sales[Quantity] * Sales[Net Price] )  -- Sales数量乘单价,生成行上下文
        )
    VAR Result = IF ( CustomerSales >= 5000, CustomerSales )   -- 如果客户销售额大于5K,返回该值
    RETURN
        Result
)

在这个度量值中,CustomerSales计算时会激活上下文转换。外层的SUMX迭代函数迭代Customrer客户表时,会创建“行上下文”(译者注:第一个SUMX函数,迭代Values生产的客户去重表),同时度量值显性使用了Calculate;由于该计算满足“上下文转换”的两个必要条件,因此VALUES ( Customer[CustomerKey] ) 对应的“行上下文”会被转换为为“筛选上下文”,从而确保内层的SUMX迭代仅对当前行中的客户编码有效(译者注:第二个SUMX函数,迭代Sales表的每一行)
In this measure, context transition happens during the evaluation of the CustomerSales variable. There is a row context introduced by SUMX iterating over Customer and an explicit invocation of CALCULATE. Therefore, the prerequisites for context transition are present and the row context over VALUES ( Customer[CustomerKey] ) is transformed into a filter context, so that the inner SUMX is executed in a filter context showing only the current customer key.

在这个代码中,上下文转换的前提要求很容易辨认。虽然我们还没有讨论上下文转换到底发生了什么,至少我们已经知道它何时发生。 真正让上下文转换变得具有挑战性的,是那些必备要素非常隐秘的场景。我们再看一下Customer数据表中创建计算列的例子。
The prerequisites for context transition are easy to spot in that code. We have not yet discussed what context transition does in detail; at least we understood when it may happen. What makes the phenomenon of context transition challenging is the presence of the prerequisites in scenarios where they are very well hidden. Let us look at the next piece of DAX code, which is a calculated column in Customer:

2)

Customer客户数据表中的计算列 Calculated column in Customer table

Sales =[Sales Amount]   -- 引用了一个 Sales Amount的字段 

这次,“计算列”中的表达式只是一个引用了一个非常简单的“度量值”。此时,上下文转换也激活了,所有的必备条件都在这里,只是隐藏很深。 “计算列”在创建时同步生成行上下文,计算列就在它对应的隐形的、行上下文中计算。这个计算引用了 名为“Sales Amount”的度量值——只要有度量值,可以认为它默认嵌套CALCULATE表达式。因此,这个度量值其实是在隐形的calculate所激活的“上下文转换”后计算的。
This time, the expression is a simple measure called from inside a calculated column. Context transition is happening: the prerequisites are all there, just very well hidden. There is a row context because the code is that of a calculated column, and calculated columns are computed in an implicit row context over the table where they are defined. The code invokes the Sales Amount measure; whenever a measure is invoked, it is automatically surrounded by CALCULATE. Hence, the measure is evaluated after the hidden CALCULATE has performed the context transition.

后面这个案例看似出奇的简单,但是它也显示了上下文转换中存在的“陷阱”。 行下文和calculate表达式结合的复杂公式,会让这一切更加复杂。 不管怎样,原则很简单:同时存在行上下文和calculate表达式的地方,就会激活“上下文转换”。 熟练理解二者的结合形式需要耗费一些时间,但这也是成为一个娴熟DAX开发者必然掌握的技能之一。 This last example looks surprisingly simple, yet it already shows the trap(陷进、圈套) of context transition. Finding the combination of row context and CALCULATE in more complex formulas might be more difficult. Nonetheless, the rule is simple: when both the row context and CALCULATE happen to be together, then they trigger the context transition. Getting used to finding the combination may take some time, but it is one of many important skills for a good DAX developer.

3)

现在,我们已经理清了上下文转换何时出现,到了进一步诠释上下文转换的效果之时了。 “上下文转换”把“行上下文”转换为等价的“筛选上下文”。 转换所创建的新的上下文包含了被行上下文迭代的所有字段列,它在当前行的数据值中筛选这些字段列。如果行上下文仅仅被一个字段列迭代,那么这个列就会成为筛选上下文的一部分;如果行上下文被整个数据表迭代,那么整个数据表的所有列都将成为新创建的筛选上下文的一部分。事实上,扩展表的所有列都将成为新创建的筛选上下文的一部分。 不过,我们不想因为扩展表让问题变得过于复杂,因此这里暂且不在这里详细展开。

Now that we have clarified when the context transition takes place, it is time to dig more into the effect of context transition. Context transition transforms the row context into an equivalent filter context. The new filter context created by the context transition includes all the columns currently being iterated by the row context. It filters these columns with the value of the current row. If the row context is iterating one column only, then that column will be part of the filter context. If the row context is iterating over an entire table, then all the columns of the table will be part of the newly-created filter context. Actually, all the columns of the expanded table will be part of the newly-created filter context. But we do not want to make things more complex by introducing expanded tables, therefore we avoid this detail in this more introductory article.

我们再回顾一下前面度量值的例子:Let us look at the measure we saw previously:

sales销售数据表中的度量值 Measure in Sales table

Sales to big customers :=               -- 大客户的销售额
SUMX (
    VALUES ( Customer[CustomerKey] ),    -- 针对 客户编码字段,投影、去重生成数据表
    VAR CustomerSales =  
        CALCULATE (                      -- 使用Calculate表达式计算:
            SUMX ( Sales, Sales[Quantity] * Sales[Net Price] )    -- Sales表中数量乘以单价
        )
    VAR Result = IF ( CustomerSales >= 5000, CustomerSales )   -- 如果客户销售额大于5K,返回该值
    RETURN
        Result
)

上下文转换发生在高亮部分(注:高亮部分是 VALUES、CALCULATE、SUMX)。由于行上下文仅仅迭代Customer[CustomerKey]字段列,因此 CALCULATE就会自动为之添加一个筛选器。 上述代码等价于下面的代码——这里显性地表示了Customer[CustomerKey]字段上的筛选条件。

The context transition is happening in the rows that are highlighted. Because the row context is iterating a table containing only Customer[CustomerKey]CALCULATE applies a filter to the Customer[CustomerKey] column automatically. The code is equivalent to the following formulation, where we made the filter on the Customer[CustomerKey] column explicit:

sales销售表的度量 Measure in Sales table

Sales to big customers :=               -- 大客户的销售额
SUMX (
    VALUES ( Customer[CustomerKey] ),    -- 生成客户编码的去重数据表
    VAR CurrentCustomerKey = Customer[CustomerKey]    -- 当前行的客户 
    VAR CustomerSales =
        CALCULATE (
            SUMX ( Sales, Sales[Quantity] * Sales[Net Price] ), 
            Customer[CustomerKey] = CurrentCustomerKey       -- 客户表的每一行、当前行 ,数据表
        )
    VAR Result = IF ( CustomerSales >= 5000, CustomerSales )
    RETURN
        Result
)

如你所见,我们把Customer[CustomerKey]字段列保存为变量,从而在calculate表达式中可以显性地引用它。As you see, we saved the current value of the Customer[CustomerKey] column in the CurrentCustomerKey variable and we then used it as an explicit filter inside CALCULATE.

新手很容易误以为,既然我们已经迭代了Customer[CustomerKey],那么它本身就已经被筛选了。这个理解是错误的。 行上下文迭代数据表,它不会在当前行中增加任何筛选。 如果没有calculate表达式和上下文转换,Customer[CustomerKey]上不会有活跃的筛选条件。 记住一句话:行上下文仅迭代,不筛选。 能执行筛选的上下文只有筛选上下文。 通过把行上下文转换为筛选上下文,我们可以创建一个新的筛选上下文,从而对行上下文迭代的数据值执行筛选。

It is very common for newbies to think that because we are iterating over Customer[CustomerKey], the column is already filtered. This is not correct. The row context iterates a table, it does not place any filter on the current value. Without CALCULATE and the context transition, there would be no active filter on Customer[CustomerKey]. Remember: the row context iterates, it does not filter. The only evaluation context that applies a filter is the filter context. By transforming the row context into a filter context, we can create a filter context that filters the values being iterated by the row context.

这个例子中,迭代的数据表只有一个字段段,因此非常简单。在计算列中,这种情形就会异常复杂。因为计算列中的上下文转换是对整个数据表执行的,因此,所有的customer数据表字段都会成为新的筛选上下文的一部分。 如下是原始的计算列:The example we have shown is quite simple, as the table iterated contained only one column. In the case of a calculated column, the scenario is more complex. The reason is that the context transition is happening in a row context that includes the entire table. Therefore, all the columns of the Customer table are part of the new filter context. This is the original calculated column:

Customer客户数据表中的计算列Calculated column in Customer table

 Sales =[Sales Amount]    -- 引用了一个 Sales Amount的字段 

当上下文转换时,customer表中的所有字段都会被筛选。 如下是使用变量从而显性筛选的等价形式:When the context transition happens, all the columns in Customer get filtered. This is the equivalent code using variables to make the filter explicit:

Customer客户数据表中的计算列Calculated column in Customer table

Sales =
VAR CurrentAddress      = Customer[Address]   -- 使用变量引用数据表中的每个字段列
VAR CurrentAge          = Customer[Age]       -- 下同
VAR CurrentBirthday     = Customer[Birthday]
VAR CurrentCity         = Customer[City]
VAR CurrentContinent    = Customer[Continent]   
VAR CurrentCountry      = Customer[Country]
VAR CurrentCountryCode  = Customer[Country Code]
VAR CurrentCustomerKey  = Customer[CustomerKey]
VAR CurrentGender       = Customer[Gender]
VAR CurrentName         = Customer[Name]
VAR CurrentState        = Customer[State]   
VAR CurrentStateCode    = Customer[State Code]
VAR CurrentZipCode      = Customer[Zip Code]
RETURN
    CALCULATE (
        [Sales Amount],
        Customer[Address]      = CurrentAddress,  -- 第二个参数开始,构成filter条件
        Customer[Age]          = CurrentAge,
        Customer[Birthday]     = CurrentBirthday,
        Customer[City]         = CurrentCity,
        Customer[Continent]    = CurrentContinent,
        Customer[Country]      = CurrentCountry,
        Customer[Country Code] = CurrentCountryCode,
        Customer[CustomerKey]  = CurrentCustomerKey,
        Customer[Gender]       = CurrentGender,
        Customer[Name]         = CurrentName,
        Customer[State]        = CurrentState,
        Customer[State Code]   = CurrentStateCode,
        Customer[Zip Code]     = CurrentZipCode
    )

如你所见,上下文转换引用了行上下文中所有字段对应的当前行,这里使用了显性的筛选条件创建筛选上下文,从而对calculate的第一个参数执行计算。 As you see, the context transition retrieves the current value of each column present in the row context; it uses that value as an explicit filter to create a filter context, under which it evaluates its first argument.

3)

第一个案例中,上下文转换发生在迭代的整个数据表中,相比发生在单一字段上而言更消耗计算资源。数据表的字段列越多,上下文转换就会耗费更多时间。 This first example shows that when a context transition takes place during an iteration over a full table, it is much more expensive than when it takes place over a table containing only one column. The more columns in the table, the more time needed for the context transition.

需要强调的是,在后面的案例中,激活上下文转换的是隐藏在度量值背后的calculate表达式。如果我们用度量值计算本身替换这里的代码,看上去就会复杂一些,如下所示: It is important to note that in this last example what causes the context transition is the hidden CALCULATE surrounding each measure reference. Indeed, if we replace the measure reference with the code of the measure itself, then the scenario is very different. Look at the following code:

Sales销售表对应的度量值 Measure in Sales table

 Sales Amount :=
SUMX(Sales,Sales[Quantity]*Sales[Net Price])

Customer客户表中的计算列 Calculated column in Customer table

Sales 1 =
[Sales Amount]

Customer客户表中的计算列(完整版无嵌套)Calculated column in Customer table

Sales 2 =
SUMX ( Sales, Sales[Quantity] * Sales[Net Price] )  

Sales 1 的计算列和前面的计算相同,而Sales 2计算列则用完整的内部计算代替了之前的嵌套。虽然看上去一样,但是二者的差异巨大。The Sales 1 column is the same code as in the previous example, whereas the Sales 2 column uses the internal expression of Sales Amount instead of referencing the Sales Amount measure. Despite looking the same, the two columns are actually very different.

Sales 1  引用了度量值,因此就会激活上下文转换。行上下文和calculate二者兼具,虽然是隐秘的。对于Sales 2 计算列而言,存在行上下文,但calculate消失不见了。Sales 2的计算结果时所有客户的销售额总和,而Sales 1的计算结果是当前行对应客户的销售额总和。记住前面所说的,行上下文不会执行任何筛选,而只是迭代整个数据表。 Sales 1 uses a measure reference, therefore there is a context transition there. Both the row context and CALCULATE are present, albeit(尽管) hidden. In Sales 2, there is a row context but CALCULATE is missing. Therefore, no context transition is taking place. The result of Sales 2 is the amount of sales to all customers, whereas the result of Sales 1 is the amount of sales to the current customer only. Remember that the presence of a row context does not imply any filtering. The row context only iterates a table.

从这个角度看,上下文转换的难点不在于它是如何运作的,而在于我们能否有效地识别它是否发生了。大部分开发者在期初会难以分辨如下两个累计计算的案例。At this point we have observed that the complexity of context transitions lies not in the operation itself, but rather in our ability to spot when the transition does and does not occur. Most developers who start to use DAX find it very hard to distinguish for example between these two versions of a running total measure:

Sales销售表的度量值:Measure in Sales table

MaxDate :=           -- 度量值,默认引用calculate表达式 
MAX('Date'[Date])    -- 迭代函数不筛选,因此会计算所有日期中的最大值——即数据表的最大值
                     -- 类似于Tableau的 {max(date)}

Sales销售表的度量值 Measure in Sales table

RT Sales 1 := 
CALCULATE (     -- 出现了calculate表达式 
    [Sales Amount],  
    FILTER (      -- filter是迭代函数,随着而来的是行上下文 
        ALL ( 'Date' ),             -- 整个日期表 ,筛选如下条件,会发现所有日期都符合 
        'Date'[Date] <= [MaxDate]     -- 这里的maxdate是在外部预先计算的,返回整个数据表最大值
    )
)

Sales销售表的度量值Measure in Sales table

 RT Sales 2 :=
CALCULATE (
    [Sales Amount],
    FILTER (
        ALL ( 'Date' ),
        'Date'[Date] <= MAX ( 'Date'[Date] )  -- 这里的max受到外部筛选上下文影响,外部筛选上下文就是视图中的维度列(年度和月份),因此返回每个可见单元格中的最大日期
    )
)

事实上,上述的两个度量值结果差异很大,如下图所示。 As a matter of fact, the two measures return a very different result.

如你所见, RT Sales 1 在每个单元格中返回了数据表的总计,而 RT Sales 2则返回了累计汇总值。 二者仅有的区别在于 RT Sales 2 中直接在filter函数中使用了max函数,而 RT Sales 1 则引用了外部的度量值。 As you see, RT Sales 1 returns the grand total of sales in every cell, whereas RT Sales 2 produces a running total of the sales amount. The difference between the two is only that RT Sales 2 uses MAX directly inside FILTER, whereas RT Sales 1 invokes a measure.

由于 RT Sales 1 引用了度量值,所以这里就有了calculate。 在两个案例中,最大日期都是使用了迭代器针对ALL所有日期表计算的。 当上下文转换发生时,度量值是在筛选上下文中计算,从而筛选被迭代的日期表,返回当前迭代数据表。 由于每个日期都小于或等于当前值,所以filter函数返回了整个日期表——所以最终计算结果是整个数据表的总计。而在 RT Sales 2 中,MAX没有calculate表达式参与,所以它在外部的筛选上下文下计算,筛选上下文是被视图的维度限定的(相当于summarizecolumns)。因此,视图中就会计算最后一个日期,从而获得正确的结果。

Because RT Sales 1 invokes a measure, we know that CALCULATE is there. In both cases, when we compute the maximum date, we are computing it inside an iteration over ALL ( Date ). When the context transition occurs, the measure is executed in a filter context that is filtering the iterated date, always returning the currently iterated date. Because each date is less or equal to itself, the FILTER function returns the entire Date table: it shows the grand total in every cell. In RT Sales 2, on the other hand, MAX is executed without a CALCULATE around it. So it computes the max date in the outer filter context, created by the visual through SUMMARIZECOLUMNS. As such, it computes the last visible date in the visual, and produces the correct result.

总结Conclusions

在这篇文章中,我们只能算是理解了上下文转换的“皮毛”。上下文转换本身不难,让它显得很难的是,度量值中自动的calculate让它变得难以识别。 除此之外,成为一名优秀的DAX开发者确有难处,在编辑DAX代码时,我们必须把可能出现的“上下文转换”牢记于心——这一点很可能在其他任何细节之前。DAX不会宽恕你的失误,每一个细节都应该被认真考虑,每一个细节都可能对结果产生影响。 学习DAX需要你放慢脚步、牢固掌握基础知识,逐渐从菜鸟成为大师。

In this article we just scratched(抓,挠) the surface of context transition. Context transition in itself is not a complex topic. What makes context transition complex to master is that it is oftentimes hard to spot because of the automatic CALCULATE present for measures. Moreover, the complexity for a good DAX developer is real as they would have to keep in mind the possibility of context transition – on top of many other details – as they author their code. DAX is not forgiving: many small details have to be taken into account, each of which might change the result of your code. Learning DAX requires you to slow down, learn the fundamentals the right way and slowly proceed from rookie to master.

如果你想了解更多,可以观看“DAX入门”系列课程。一旦你掌握了这个内容,可以继续我们的私人课堂,或者观看“驾驭DAX”的在线课程。

If you want to learn more, then start by watching the free Introducing DAX video course. Once you have digested that content, proceed with one of our in-person classroom courses and/or with watching the Mastering DAX online video course.


后记:

本文中涉及到的主要外部知识


了解 喜乐君 的更多信息

订阅后即可通过电子邮件收到最新文章。

了解 喜乐君 的更多信息

立即订阅以继续阅读并访问完整档案。

Continue reading