简体   繁体   中英

How to implement imputation with Python script in Power BI?

I am trying to run a validated Python script to impute data in PowerBI. The data is originally consolidated in Power BI, then exported to Excel, imputed and analysed with Python.

Now, I would like to use the code from Python into Power BI's query editor, so that I can get imputed data directly into Power BI and use its visualizations, but I get errors.

I tried pasting the same code I have in Python in Power BI - I think there might be an issue with the syntax.

dataset=#"PreviousStep"

import pandas as pd

byISO = dataset.groupby(['country ISO'])
byIG = dataset.groupby(['WBG Income Group'])
bytIG = dataset.groupby(['WBG Income Group','Year'])
bytR = dataset.groupby(['UN Sub-Region','Year'])

#Country-level
#Filling up and down
dataset[['col1','col2']] = byISO[['col1','col2']].fillna(
        method='ffill')
dataset[['col1','col2']] = byISO[['col1','col2']].fillna(
        method='bfill')
#Interpolation
dataset[['col1','col2']] = byISO[['col1','col2']]\
         .apply(lambda i: i.interpolate(method='linear', limit_area='inside'))
#Extrapolation (FILLING DOWN CURRENTLY)
dataset[['col1','col2']] = byISO[['col1','col2']]\
         .apply(lambda i: i.interpolate(method='linear', limit_area='outside'))
#Median
dataset[['col1','col2']] = byISO[['col1','col2']]\
    .transform(lambda i: i.fillna(i.median()))

#Group-level
#Median
dataset[['col1','col2']] = byIG[['col1','col2']]\
    .transform(lambda i: i.fillna(i.median()))
#Yearly median
dataset[['col1','col2']] = bytIG[['col1','col2']]\
    .transform(lambda i: i.fillna(i.median()))

#Region-level
#Yearly median
dataset[['col1','col2']] = bytR[['col1','col2']]\
    .transform(lambda i: i.fillna(i.median()))
#No level (All)
#0
dataset[['col1','col2']].fillna(0)

I expect a table with imputed values, but I get this error as a result instead:

DataSource.Error: ADO.NET: Python script error.
Traceback (most recent call last):
  File "PythonScriptWrapper.PY", line 2, in <module>
    import os, pandas, matplotlib.pyplot
  File "C:\Users\GEscamilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\__init__.py", line 19, in <module>
    "Missing required dependencies {0}".format(missing_dependencies))
ImportError: Missing required dependencies ['numpy']

Details:
    DataSourceKind=Python
    DataSourcePath=Python
    Message=Python script error.
Traceback (most recent call last):
  File "PythonScriptWrapper.PY", line 2, in <module>
    import os, pandas, matplotlib.pyplot
  File "C:\Users\GEscamilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\__init__.py", line 19, in <module>
    "Missing required dependencies {0}".format(missing_dependencies))
ImportError: Missing required dependencies ['numpy']

    ErrorCode=-2147467259
    ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException

If you look at the error output it is telling you

ImportError: Missing required dependencies ['numpy']

This means that you have to import numpy along with your other import statements as @prathik says in the comment. You can find example here from microsoft

import numpy

If that does not work you need to make sure you need to install with

pip install numpy

The bigger picture

You should consider placing the script before the dashboard - so that the transformed data can be used by other dashboards as well.

Usually I would recommend making all data transformations in a Data Warehouse, or a mart for a specific purpose. However, this all depends on whether or not this a one-time exercise or something you are going to use in production.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM