Monday, September 17, 2018

How to run R and Python in SQL Server with machine learning services

Introduction

Did you know that you can write R and Python code within your T-SQL statements? The service services machine in SQL Server eliminates the need to move data. Instead of transferring large and confidential data over the network or losing accuracy with the sample CSV files, you can have your R / Python code executed in your database. Easily implement your R / Python code with SQL stored procedures, making them accessible in your ETL processes or in any application. Train and store machine learning models in your database, bringing intelligence to where your data is.


You can install and run any of the most recent open-source R / Python packages to create Deep Learning and AI applications in large amounts of data in SQL Server. We also offer high-performance tip algorithms on the Revo Scale R and Revo Scale Py APIs from Microsoft. Using them with the latest innovations in the free software world allows you to bring an unrivalled selection, performance and scale to your applications.

Basic concepts

Open SQL Server Management Studio and make a connection to the server. Open a new query and paste this basic example: (Although I use Python in these examples, you can also do everything with the R)

EXEC sp_execute_external_script @language = N'Python',
@script = N'print(3+4)'

Sp_execute_external_script is a special system stored procedure that allows the execution of R and Python in SQL Server. There is a "language" parameter that allows us to choose between Python and R. There is a "script" parameter where we can paste R or Python code.

Introduction to the parameter

Now that we discuss a basic example, let's start adding more pieces:

EXEC sp_execute_external_script  @language =N'Python'
@script = N' 
OutputDataSet = InputDataSet;
'
,
@input_data_1 =N'SELECT 1 AS Col1';

Machine Learning Services provides more natural communications between SQL and R / Python with an input data parameter that accepts any SQL query. The name of the input parameter is called "input_data_1".

You can see in the python code that there are default variables defined to transmit data between Python and SQL. The names of the default variables are "Output Data Set" and "Input Data Set" You can change these default names like this example:

EXEC sp_execute_external_script  @language =N'Python'
@script = N' 
MyOutput = MyInput;
'
,
@input_data_1_name = N'MyInput',
@input_data_1 =N'SELECT 1 AS foo',
@output_data_1_name =N'MyOutput';

When running these examples, you should have noticed that each returns a result with "(no column name)". You can specify a name for the columns returned by adding the WITH RESULT SETS clause to the end of the statement, which is a comma-separated list of columns and their data types.

EXEC sp_execute_external_script  @language =N'Python'
@script=N' 
MyOutput = MyInput;
'
,
@input_data_1_name = N'MyInput',
@input_data_1 =N'
SELECT 1 AS foo,
2 AS bar
'
,
@output_data_1_name =N'MyOutput'
WITH RESULT SETS ((MyColName int, MyColName2 int));


Input / output data types

Well, let's see a little more about the input/output data types used between SQL and Python. Your SQL input SELECT statement passes a "Data frame" to python, depending on the Python Pandas package. Your output from Python to SQL must also be in a Pandas Data frame object. If you need to convert scalar values to a data frame, here's an example:

EXEC sp_execute_external_script  @language =N'Python'
@script=N' 
import pandas as pd
c = 1/2
d = 1*2
s = pd.Series([c,d])
df = pd.DataFrame(s)
OutputDataSet = df
'

The variables c and d are scalar values, which you can add to a series of pandas if desired, and then convert them into a panda data frame. This shows a slightly more complicated example, go read in the documentation of the python pandas package for more details and examples:

EXEC sp_execute_external_script  @language =N'Python'
@script=N' 
import pandas as pd
s = {"col1": [1, 2], "col2": [3, 4]}
df = pd.DataFrame(s)
OutputDataSet = df
'




5 comments:

Merits & Demerits of Data Analytics

Definition:  The data analysis process was concluded with the conclusions and/or data obtained from the data analysis. Analysis data show...