DistinctCount -- Analysis Service

Another Way for DistinctCount

As we all know that DistinctCount is a challenge for SSAS. Since its not an additive measure
AS needs additional efforts to get distinct count.


There are number of articles on how to optimize distinct count.
In general there we can get distinct count by two ways
1. Set aggregation usage of Distinctcount type while designing cube.
2. Using DistinctCount MDX Function

There are some alternate ways also available on how to count distinctcount
Lets look at another approach on how to get distinctcount.

Our problem is to get distinctcount of product sold by each reseller.

first we will get products for which orders are placed

nonempty([Product].[Product].members,[Measures].[Reseller Order Count])

Then we will count this number of products

count( nonempty([Product].[Product].members,[Measures].[Reseller Order Count]))
member [measures].[dc1]

Here it is count of individual products for which orders are placed and that is distinctcount for us also


WITH MEMBER [measures].[AlternateDistinctCount]
AS
COUNT( nonempty([Product].[Product].MEMBERS,[Measures].[Reseller Order Count]))
MEMBER [measures].[MDXDistinctCount]
AS
DISTINCTCOUNT({[Product].[Product].members*[Measures].[Reseller Order Count]})
SELECT {[measures].[AlternateDistinctCount],[measures].[MDXDistinctCount]} ON 0,
[Reseller].[Reseller].MEMBERS
ON 1
FROM [Adventure Works]



Lets verify the result

A Bike Store 33 33

A Bike Store has placed orders for 33 products

SELECT
[Reseller].[Reseller].&[1] ON 0,
NON EMPTY{[Product].[Product].MEMBERS*[Measures].[Reseller Order Count]} ON 1
FROM
[Adventure Works]




It shows orders placed for 33 products

7 comments:

Anonymous said...

Good post.. Thank you

Sam Kane said...

Here are this and some other articles on SSAS Distinct Count:

http://ssas-wiki.com/w/Articles#Distinct_Count

Unknown said...

Thanks Sam :-)

Dibyant Upadhyay said...

Good post!!
But both will suffer in the place where you have large amount of data.I am talking about 10-20 mil of data daily then your cube will suffer. if you even do partitions like i do then also it will take around 20 min just to give you data for 300 days. IT WILL READ ALL THE PARTITIONS to bring you data.
Then how would you solve this ?
then there is 2 ways i could come (even after going through MS support they couldn't help a lot).None of yours will work trust me I had this disease and i know how much it haunts to users if they are using excel as tool they cant even control MDX generated by this.
1.Change the non empty property of calculated measure inside the cube will little help 10 % around.
2.You have to do everything inside database design to handle this.
Mode your MG so that you can use distinct count. This is the final solution i came to know.
I handle a data warehouse which has just around 300 gb data for single year.we use DB2 for database back end it is Terabytes DB.
I would appreciate any other alternative approach if any one has come across.

Noor said...

I've a cube named Survey Cube..where Survey is done for the stores located at different locations. If you consider a single survey then it will not be a problem as it gives you correct result like say

for the survey :1 ..if we have 100 stores (Count of Distinct stores) and Responses as 80 and Non response is 100-80 = 20.

for survey : 2.. if we have 20 stores (count of distinct stores) and responses as 10 and Non Responses as 20-10= 10

Now, if you consider the Grand Total .. the responses is coming as 90 (i.e. 80 + 10) and non Responses is coming as 30 (i.e. 20+10) or 120-90=30.

But the Store count is coming as 100 (lets say 100 are the distinct for the two surveys) not 120.

So, Now my aim is to how to display SUM in the Grand total for the Store Count.

More on this...


Above is the screen shot taken from the Excel Pivot table which is taking data from SSAS cube.

From the above Store Count, Response and Non Response are the measures.

And in the Row label we have Dimension as Survey Name.

Here Store Count is COUNT (DISTINCT Stores) whereas Response is SUM (Response) and

non Response is COUNT(DISTINCT Stores) - SUM (Response).

Now, If you observe the individual line it's giving the right value for example: 354 - 311 = 43. but when you look at the Grand Total it's not correct. Here 1606 is not the sum of all the above Stores instead it is taking as DISTINCT COUNT in Grand Total also.

Now my question is How to make the Store Count in GRAND TOTAL AS SUM that means from the above screen shot the Grand Total for Store count should come as 4540 and not 1606?

Please help in achieving this.

Unknown said...

Hello, I think there is a problem with these solutions. If you try to slice on an attribute higher up in the same hierarchy as the attribute you're trying to count, eg Product Category (how many distinct products are sold in each category), it will fail. At least in SSAS 2014 when I try.

varun said...

Can you please explain how to implement
1. Set aggregation usage of Distinctcount type while designing cube.
2. Using DistinctCount MDX Function

Post a Comment

Popular Posts