Determine the minimum arrival delay using a large set of flight data stored in a database. Access the database using a parallel pool.
To initialize a parallel pool with a JDBC database connection, you must configure a JDBC data source. For more information, see the configureJDBCDataSource
function.
Using the splitsqlquery
function, you can split the original SQL query into multiple SQL page queries. Then, you can access large data in chunks by executing each SQL page query on a separate worker in the pool.
When you import large data, the performance depends on the SQL query, amount of data, machine specifications, and type of data analysis. To manage the performance, use the splitsize
input argument of the splitsqlquery
function.
If you have a MATLAB® Parallel Server™ license, then use the parpool
(Parallel Computing Toolbox)
function with the cluster profile of your choice instead of the gcp
(Parallel Computing Toolbox)
function.
Create a database connection to the JDBC data source MSSQLServerJDBCAuth
. This data source configures a JDBC driver to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password.
Define an SQL query to select all columns from the airlinesmall
table, which contains 123,523 rows and 29 columns.
Split the original SQL query into multiple page queries and display them. Specify a split size of 10,000 rows.
querybasket =
13×1 string array
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 0 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 10000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 20000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 30000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 40000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 50000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 60000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 70000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 80000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 90000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 100000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 110000 ROWS FETCH NEXT 10000 ROWS ONLY"
" SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 120000 ROWS FETCH NEXT 3523 ROWS ONLY"
The query basket contains the page queries in a string array. Each SQL query in the basket, except the last one, returns 10,000 rows.
Close the database connection.
Start the parallel pool.
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).
Initialize the parallel pool using the JDBC data source.
Define the airlinesdata
variable.
Define the minimum arrival delay minArrDelay
variable.
Use the parfor
function to parallelize data access using the query basket.
For each worker:
Retrieve the database connection object.
Execute the SQL page query from the query basket and import data locally.
Find the local minimum arrival delay.
Store the local minimum arrival delay.
Find the minimum arrival delay using the stored delays from each worker.
Close the parallel pool.