简体   繁体   中英

From R script to Power Bi - how to use setdiff

I've got two data frames: zerowy_nazwa5 , zatwierdzony_nazwa5 ,

and working 2 lines:

setdiff(zatwierdzony_nazwa5, zerowy_nazwa5)
setdiff(zerowy_nazwa5, zatwierdzony_nazwa5)

how I implement this in PowerBi?

Thanks for help

Your question is rather unclear, so I'm going to have to make some assumptions. I will interpret your question as how to natively perform a set difference in Power BI.


Suppose we have tables A and B as follows

Table A:   Table B:      
Column     Column
------     ------
 1          2
 2          4
 3
 4
 5

and we want to get the set difference A - B

 Column
 ------
  1
  3
  5

You can do it in DAX or in the Power Query M language:


M language

You can do this using a left anti join . The M code looks like this:

 = Table.NestedJoin(A,{"Column"},B,{"Column"},"B",JoinKind.LeftAnti)

Delete the new "B" column and you're good to go.

Another way is to use the Table.SelectRows function:

= Table.SelectRows(A, each not List.Contains(B[Column], _[Column]))

DAX language

You just need to filter table A to exclude values in table B :

FILTER(A, NOT( A[Column] IN VALUES( B[Column] ) ) )

Or using the older CONTAINS syntax instead of IN :

FILTER(A, NOT( CONTAINS( VALUES( B[Column] ), B[Column], A[Column] ) ) )

Note: It certainly is possible to use R scripts within the Power Query environment, as vestland points out. It is not currently possible to use R scripts within a DAX expression, as Juan points out.

Reading your question, I'm assuming this:

  1. Your main goal is to do this internally in PowerBI
  2. You're not specifically asking how to do it using DAX

The Power of R in Power BI is not limited to R Visuals. You can load both single and multiple tables and use them as input to R scripts and any R functionality using Edit Queries > Transform > Run R Script .

Here's an example using two synthetic dataframes and setdiff():

Snippet 1 (from the dplyr::setdiff examples in R)

library(dplyr)
a <- data.frame(column = c(1:10, 10))
b <- data.frame(column = c(1:5, 5))
c <- dplyr::setdiff(a, b)

# Output
# column
# 1      6
# 2      7
# 3      8
# 4      9
# 5     10

Since you didn't describe your expected output, I'm assuming this is what you were after. But beware that if you're not using the dplyr library, base::setdiff() will give a different output:

Snippet 2

c <- base::setdiff(a, b)

# output

# column
# 1       1
# 2       2
# 3       3
# 4       4
# 5       5
# 6       6
# 7       7
# 8       8
# 9       9
# 10     10

And if you carefully follow the steps in this post you will be a able to end up with this in Power BI. But here's the essence of it: To reproduce the example, go to Edit Queries (Power Query Editor) > Enter Data and click OK . Then insert an R script using Transform > Run R script and insert the snippet above.

在此处输入图片说明

If anything is unclear, or if you're not able to reproduce the result, let me know.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM