Data miner in Oracle

I have played a bit with Oracle data miner. This feature is available in the enterprise edition of Oracle version 11 or above. It can be accessed from SQL developer. Before using the feature, one must first set up a user. This can be done with:

CREATE USER dmuser IDENTIFIED BY dmpassword
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp
QUOTA UNLIMITED ON users;
Commit;
GRANT CREATE JOB TO dmuser;
GRANT CREATE MINING MODEL TO dmuser;
GRANT CREATE PROCEDURE TO dmuser;
GRANT CREATE SEQUENCE TO dmuser;
GRANT CREATE SESSION TO dmuser;
GRANT CREATE SYNONYM TO dmuser;
GRANT CREATE TABLE TO dmuser;
GRANT CREATE TYPE TO dmuser;
GRANT CREATE VIEW TO dmuser;

Once this user is created, one must set up a connection with the dmuser. This connection is used in the data miner widget. It looks like:

Note that we have opened the data miner widget. And that we have used the connection with the dmuser. Moreover, upon the first usage, we have to set a repository.

Each analysis is done within a project and a workflow. Within that workflow, different steps can be defined. The most logical step that is used as a start is a “Data Source”. Within that data source, we may define a table that exists on Oracle.

After that, we may define one or more steps that use such data source. In the example above, we explore the data and we undertake a regression.

It is interesting to see that most settings have a default value that allows to get the procedure running quickly. With the regression, I played with two different settings.

The first setting was expliciting indicating the input variables:

Whereas as the second setting was related to the number of observations. This can be steared with the Build – Properties. As a standard, only a subset is included in the population. One may like to include all observations to improve the quality of the estimations.

Running was pretty quick. I compared the outcomes to the regression estimates from another programme to have a check.

All in all, I was quite pleased with this feature. I could imagine that this tool set offers enough for an a information department. Moreover, it is really easy to use. Despite its simpleness, it offers regression, cluster analysis and some text analysis.

Door tom