sitemap Tom van Maanen

Materialized views on Oracle

June 17, 2007 on 10:24 pm

Materialized views provide us with a snapshot of data. Once a materialized view is created like:

drop materialized view scott.mat_woon;
create materialized view scott.mat_woon
as
select plaats, naam, count(*)
from feit, dimensie
where feit.WOON = dimensie.WOON
group by plaats, naam

, we have a snapshot of data that provide us with an overview of “plaats”, “naam” and counts as they exist when the materialized view was created. When we query the materialized view, we get a sub-second response on how “plaats”, “naam” and counts existed on moment of creation.

With this definition of the materialized view, we get the same response even when the underlying tables are changed. This implies that changes in the underlying tables are not directly reflected in the materialized view. Just like a snapshot, we get the response from the materialized view as if the data were still at the moment of creation of the materialized view.

After some time, the snapshot data in the materialized view must be refreshed. Several possibilities are around. One way is a refresh thru a standard stored procedure, like EXECUTE DBMS_MVIEW.REFRESH(’scott.mat_woon’,‘C’);Once run, the snapshot are refreshed.

As an example, assume we have a record in the underlying table that has`janny` as `naam`. Once the materialized view is created, we recognize `janny` as a possible `naam`.

After some time, `janny` is rewritten as `jannytom` in the underlying table. However, in the materialized view, we still see `janny`. This is only changed after the stored procedure EXECUTE DBMS_MVIEW.REFRESH ’scott.mat_woon’,'C’); is run. After that, `jannytom` is included in the materialized view.

Searching in text strings

June 16, 2007 on 10:32 pm

Generally, searching in textstrings is similar to searching in numbers. Suppose one is interested in finding my age. He could start a query that sounds like: select age from table where name=’Tom van Maanen’.

Such a query is similar to a query that tries to find everybody who is 50: select name from table where age=50.But, one may have misspelled my name: be it Tom Van Maanen, or Tom Vanmaanen or Ton van Maanen. All possible misspellings that happened in my life. In such case, a query that reveals my age does not return any result. Lucky me. 

Oracle offers a solution: it may translate your string into a code that can be used to compare strings. Small mistakes are given the same code. This circumvents the problems that small misspellings lead to not finding a record. This function is labelled “soundex”. The algorithm stems from the twenties in the 20th century when immigration officers often misspelled the names of immigrants. To be able to find them back the algorithm was invented.

A query that tries to find my name looks like: select age from table where soundex(name)=soundex(’Tom van Maanen’). The translation thru soundex of my name is T515. The translation of Tom Van Maanen is also T515, Tom van Maanen is encoded to T515, also: Tom Vanmaanen T515 and Ton van Maanen is T515. As the codes are all the same, Tom van Maanen is linked to the misspellings Tom Van Maanen etc.

Soundex is a standard function in Oracle. You can use it like any other function. Like “length”, one may use “Soundex”.

There are complaints that the soundex function leads to matches that make no sense. As an example “Capgemini” and “Cap Gem Boevenbende” are both translated by soundex into C125. Hence Capgemini is linked to Cap gem Bovenbende, which doesn’t make much sense.

Other encoding mechanisms exist: I found “metaphone”. This algorithm has been implemented in a PL/SQL procedure. Once found it can be imported into your function list. It then works likes any other PL/SQL function. The codes generated by metaphone are far more discriminate.

The codes with misspellings from my name are:

NAAM METAPHONE(NAAM)
Tom Van Maanen TMVNMNN
Tom van Maanen TMVNMNN
Tom Vanmaanen TMVNMNN
Ton van Maanen TNVNMNN

You may see that Tom van Maanen generates the same as Tom Vanmaanen. However Tom is distinguished from Ton. It is a small difference but is can be seen as the two codes have slightly different outcomes.

The full PL/SQL can be found in http://www.geocities.com/oracletricks/plsql/metaphone.txt . Have fun with it.

Changed Data Capture in Oracle

November 24, 2006 on 11:05 pm

In modern days DBMS, we encounter functions that help us to set up a well-performing data warehouse. One such function is the concept of CDC: the ability to capture changed data. This helps us to identify changes in data. If we need only changed data, such functionality helps us to identify which rows are changed and what their new value is.

In Oracle, this CDC (Changed Data Capture) is established by the creation of a new table, that looks similar to the table that is observed for possible changes in the data. This table is created by a function that also creates a link between the table that is observed for possible changes and the receiving table.

 Let is first create a table that will be observed for possible changes:

 CREATE TABLE FLIP(WOON INTEGER, PLAATS VARCHAR2(35));

Table “flip”is observed for possible changes in the data. The changes are collected in a second table (”CDC_FLIP”) that is created by:

 EXECUTE DBMS_LOGMNR_CDC_PUBLISH.CREATE_CHANGE_TABLE ( -
OWNER => ‘SCOTT’, -
CHANGE_TABLE_NAME => ‘CDC_FLIP’, -
CHANGE_SET_NAME => ‘SYNC_SET’, -
SOURCE_SCHEMA => ‘SCOTT’, -
SOURCE_TABLE => ‘FLIP’, -
COLUMN_TYPE_LIST => ‘WOON INTEGER, -
                   PLAATS VARCHAR2(35)’, -
CAPTURE_VALUES => ‘BOTH’, -
RS_ID => ‘N’, -
ROW_ID => ‘N’, -
USER_ID => ‘Y’, -
TIMESTAMP => ‘Y’,  -
OBJECT_ID => ‘N’, -
SOURCE_COLMAP => ‘N’, -
TARGET_COLMAP => ‘N’, -
      OPTIONS_STRING => NULL);
 Let us assume, some rows are added to “flip”:

 INSERT INTO FLIP VALUES(12,’GOUDA’);
INSERT INTO FLIP VALUES(13,’BODEGRAVEN’);
COMMIT;

 

The additions to the data are also collected by the second table “CDC_FLIP”:

select * from CDC_FLIP yields:

OP      CSCN$ COMMIT_TI USERNAME$                      TIMESTAMP       WOON
– ———- ——— —————————— ——— ———-
PLAATS
———————————–
I  2.8147E+14 01-JAN-00 SCOTT                          24-NOV-06         12
GOUDA

I  2.8147E+14 01-JAN-00 SCOTT                          24-NOV-06         13
BODEGRAVEN
 

Bitmap indices in Oracle

November 13, 2006 on 11:09 pm

We know that an index might improve response time of a query. As an example, one may think of a table that contains 1 million records and 3 attributes. Assume that one attribute acts as the indexing attribute. A query that uses that attribute is faster if an index is created with that attribute than without such index.

Again: a query on a one million record table runs on my machine 3-5 seconds.

If a query is created (CREATE INDEX SCOTT.INDEX_FEIT_IDX ON SCOTT.FEIT_IDX(WOON)), the response time is down to 1-2 seconds.

Further improvement is possible if a bitmap index is used. This index is created with CREATE BITMAP INDEX SCOTT.BITMAP_INDEX_FEIT_IDX ON SCOTT.FEIT_BIT(WOON). It leads to gains whereby the query is executed in 1/2 second.

The bitmap index is said to have substantial improvements in query speed if:

  • For columns with very few unique values (low cardinality). Columns that have low cardinality are good candidates (if the cardinality of a column is <= 0.1 %  that the column is ideal candidate, consider also 0.2% – 1%). In the example above, the cardinality was 11 within a million records, hence we were talking on 0.001 %.
  • Tables that have no or little insert/update are good candidates (static data in warehouse). Updating requires a refreshment of the index which consumes time. This disadvantage can only be outweigthed in case of few updates/ many queries.

 

This can be enhanced if we look at the explain plans that can be run.

 A way to use the explain is as follows. First connect as sysdba (CONN sys/password AS SYSDBA), Then run the explain plan utility (@$ORACLE_HOME/rdbms/admin/utlxplan.sql
). Grant access to the explainplan table (GRANT ALL ON sys.plan_table TO public;). Create a publix synonym for the explain plan table (CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;). Finally switch on the explain plan utility (set autotrace on).

 Above three queries gave:

  No index Btree index Bitmap Index
  2634 2119 244 Physical  Reads
 

 This demonstrates that (in this case) a bitmap requires less physical reads, which explains less time involved for the execution of a query.

Usage of partioned tables in Oracle

November 7, 2006 on 10:27 pm

Oracle has the ability to partion its tables. The idea is that you systematically store data in certain areas that together constitute the whole table. If data that connected together are stored in such subsection of the table, such subsections can be manipulated as it was a certain unit.

As an example, one may think of a daily load of data. If data that belong to a certain day are stored in a subsection of the table, such subsection can be manipulated as such. If the dataload is sent a second time (as result of a previous mistake), the subsection can be truncated and subsequently reloaded with the new data delivery.

Technically, the table is created as:

CREATE TABLE PARTTABLE
(
  SALESMAN_ID    NUMBER(5),
  SALESMAN_NAME  VARCHAR2(30 BYTE),
  SALES_AMOUNT   NUMBER(10),
  SALES_DATE     DATE,
  JAARMAAND      NUMBER(6)
)
PARTITION BY LIST (JAARMAAND)

  PARTITION P200001 VALUES (200001),
  PARTITION P200002 VALUES (200002)
);

The table is divided in two section: one is indicated by P200001 and the other is indicated by P200002. If data from a certain datadelivery are stored in one subsection ( P200001 ), one is able to truncate the partition P200001 before the data are delivered.

The trucation is technically implemented as:

ALTER TABLE PARTTABLE TRUNCATE PARTITION P200001;

 

This implies that a data warehouse can be loaded along next line of thought:

(1) Store data in partioned tables

(2) Each dataload is loaded in one partition

(3) Before loading, the partition is truncated first to drop previous dataloads

 

External files in Oracle

November 7, 2006 on 9:44 pm

Oracle has the feature external table.

The external table can be seen as a view from Oracle to a flat file. Assume that we have an external file with is called “winst.dat” and that contains data like:

 

Jan,111,223,122,345,324,244,123,123,345,121,345,197
Flip,234,234,123,457,456,287,234,123,678,656,341,567
Next lines of code show how the view on the external file can be created:

CONNECT sys/bunvegni@tom10 AS SYSDBA;

REM The directories are located on the same server as the Oracle DBMS

CREATE OR REPLACE DIRECTORY dat_dir AS ‘C:\download\externalTable’;
CREATE OR REPLACE DIRECTORY log_dir AS ‘C:\download\externalTable’;
CREATE OR REPLACE DIRECTORY bad_dir AS ‘C:\download\externalTable’;

GRANT READ ON DIRECTORY dat_dir TO scott;
GRANT WRITE ON DIRECTORY log_dir TO scott;
GRANT WRITE ON DIRECTORY bad_dir TO scott;

REM You can now use a new form of the CREATE TABLE statement that looks like a cross between a SQL statement and a SQL*Loader control file:

CONNECT scott/tiger@tom10;

drop table opbrengst;

CREATE TABLE opbrengst (person      VARCHAR2(20),
                     jan     NUMBER(4),
                     feb     NUMBER(4),
                     mar     NUMBER(4),
                     apr     NUMBER(4),
                     mai     NUMBER(4),
                     jun     NUMBER(4),
                     jul     NUMBER(4),
                     aug     NUMBER(4),
                     sep     NUMBER(4),
                     oct     NUMBER(4),
                     nov     NUMBER(4),
                     dec     NUMBER(4))
 ORGANIZATION EXTERNAL
 (
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY dat_dir
   ACCESS PARAMETERS
   (
     records delimited by newline
     badfile bad_dir:’opbrengst%a_%p.bad’
     logfile log_dir:’opbrengst%a_%p.log’
     fields terminated by ‘,’
     missing field values are null
     ( person,
       jan,
       feb,
       mar,
       apr,
       mai,
       jun,
       jul,
       aug,
       sep,
       oct,
       nov,
       dec
     )
   )
   LOCATION (’winst.dat’)
 )
 PARALLEL 4
 REJECT LIMIT UNLIMITED;

REM When you create an external table, you’re really only creating some data dictionary entries. Nothing exciting happens until you query the table. Realize though, that you can query the table as you would any other SQL table. Let’s assume you had the following production table:
drop table revenue;
 CREATE  TABLE revenue (
    person       VARCHAR2(20),
    month        VARCHAR2(3),
    revenue      NUMBER,
    CONSTRAINT revenue_pk PRIMARY KEY (person,month));

REM Given this table, you can use the following INSERT…SELECT FROM statement to extract revenue data from your external file in a normalized format and insert it into your production table.

INSERT INTO revenue (person,month,revenue)
   SELECT person,’Jan’,jan
   FROM opbrengst
   WHERE jan IS NOT NULL
UNION ALL
   SELECT person,’Feb’,feb
   FROM opbrengst
   WHERE feb IS NOT NULL
UNION ALL
   SELECT person,’Mar’,mar
   FROM opbrengst
   WHERE mar IS NOT NULL
UNION ALL
   SELECT person,’Apr’,apr
   FROM opbrengst
   WHERE apr IS NOT NULL
UNION ALL
   SELECT person,’Mai’,mai
   FROM opbrengst
   WHERE mai IS NOT NULL
UNION ALL
   SELECT person,’Jun’,jun
   FROM opbrengst
   WHERE jun IS NOT NULL
UNION ALL
   SELECT person,’Jul’,jul
   FROM opbrengst
   WHERE jul IS NOT NULL
UNION ALL
   SELECT person,’Aug’,aug
   FROM opbrengst
   WHERE aug IS NOT NULL
UNION ALL
   SELECT person,’Sep’,sep
   FROM opbrengst
   WHERE sep IS NOT NULL
UNION ALL
   SELECT person,’Oct’,oct
   FROM opbrengst
   WHERE oct IS NOT NULL
UNION ALL
   SELECT person,’Nov’,nov
   FROM opbrengst
   WHERE nov IS NOT NULL
UNION ALL
   SELECT person,’Dec’,dec
   FROM opbrengst
   WHERE dec IS NOT NULL;

COMMIT;

 

 

Start a PL/SQL job within Oracle from the commandline

September 30, 2006 on 10:31 pm

In a certain case, we would like to have started a PL/SQL procedure from the commandline. We did did with the next batch programme that contained two parameters:

  • one with a filename that refered to the actual invocation on the PL/SQL procedure
  • a second parameter with a filename that contained the connecdtion parameters.

From there it was quite straightforward as can be shown below:

 

@echo off
rem ———————————————————————–
rem Filename:   RunPLSQL.bat
rem Purpose:    Run PLSQL procedure from DOS batch file
rem Date:       30-Sept-2006
rem Author:     Tom Van Maanen
rem ———————————————————————–
rem — Accept command line arguments –
rem Note: %1 is the first command line argument that stands for the file with SQL commands
rem Note: %2 is the second command line arguments that holds the connection parameters
rem
if NOT “%2″  == “” goto Gadoor1
if NOT “%1″  == “” goto Gadoor1
echo Plse usage 2 arguments, the first being the file with SQL commands
echo and the second the connection parameters
goto End

:Gadoor1

if exist %1 goto Gadoor2
echo command line argument 1 should exist as a file with SQL commands.
echo This file has not been found
goto End

:Gadoor2

if exist %2 goto Gadoor3
echo command line argument 2 should exist as a file with the connection parameters.
echo This file has not been found
goto End

:Gadoor3
if not exist ff143256547.sql goto Gadoor4
echo I use a file that is labelled as ff143256547.sql to execute this script
echo This already exists
echo plse rename the file or rewrite this script
goto End

:Gadoor4
type %2 > ff143256547.sql
type %1>>ff143256547.sql
sqlplus /nolog @ff143256547.sql
del ff143256547.sql
:End

 

Most of this program is concentrated on errors (omission of parameters, omission of files with the required commandline etc). In fact only three lines are really important:

  •  type %2 > ff143256547.sql, which stored the connection parameter in a file. The connection parameter is something like ” connect scott/tiger@tom” with scott the user id, tiger the password and tom10 the reference to the Oracle DBMS on with the PL/SQL procedure is stored.
  • type %1>>ff143256547.sql, which stored the commands that are sent to Oracle. It contains something lile ” EXEC PL/SQL procedure ”  and a quit command to leave Oracle.
  • sqlplus /nolog @ff143256547.sql that actually starts off Oracle.

Following this logic, we could start PL/SQL procedure from outside Oracle. Nice thing to know for the future.

 

Hierachies in Oracle

September 20, 2006 on 10:23 pm

It may happen that you get an Oracle table that looks like:

We have a table that gives department codes, their governing department and a description of the department. It is shown that a department is linked to the governing department. It is also shown how the governing department is also a subsidiary of another department.

 

1   BOARD
2 1 EUROPE
3 1 AMERICAS
4 2 GERMANY
5 2 FRANCE

It is shown that France is subject to Europe. It is also shown that Europe is subject to the Board.

 Now, we would like a have one line in which the whole hierachy is shown. Such a line would look like Board>Europe>France. The question is to get a query that would retrieve such result. Such a solution exists in:

SELECT SYS_CONNECT_BY_PATH(name,‘~’) as afd_codes FROM afdeling START WITH afdeling_hoger IS NULL CONNECT BY afdeling_hoger = prior afdeling;

The resultset is:

      ~BOARD
      ~BOARD~EUROPE
      ~BOARD~EUROPE~GERMANY
      ~BOARD~EUROPE~FRANCE
      ~BOARD~AMERICAS

This query is not trivial. It links records and connects them in one outcome set. The linkage is established by the “CONNECT BY” clause.      

   

Batch programming with SQL Server

May 29, 2006 on 9:51 am

Getting data in a SQL Server in a batch operation is relatively easy. However due to settings of a database server, some methods work while other don’t. So from a series of tools, a choice must be made to accomplish the goal of adding data to a SQL Server database.

In principle two tools are available when SQL Server is installed:

  • sqlcmd.exe
  • osql

 In SQL Server 2000, one may also encounter isql but its use is discouraged as it is going to be replaced by sqlcmd.exe. 

The tools can be used from the command line. Its syntax can be derived if the help is invoked: for sqlcmd, this is done via sqlcmd -? and for osql, it is done via osql help.

Already from this simple help functionality, the syntax was clear. I wrote a command line like “osql -D sqlserver -U sa -P bunvegni -o nix.txt -Q “select * from test” to get data from table test. This table is stored in a SQL Server database that is accesible via ODBC with dsn “sqlserver”. The userid that I used was sa and the password was “bunvegni”. The results were stored in a file that was called “nix.txt”.

The osql module did work while sqlcmd did not. I understand that osql supports ODBC connections while sqlcmd does not support such connections. If only the ODBC connections are open, the osql tool is your option to go ahead. For me, that implicated that both tools should be used when the exact database settings are not known in advance. 

Batch FTP

May 18, 2006 on 9:48 pm

The issue here is that we would like to transfer a file in a batch. This may be an issue if we have a file that needs to be transferred somewhere during the night during a data warehouse process.

It is possible to create a file that contains all ftp commands that must be given during the ftp session. An example might be a file that contains next lines:

open 10.0.0.135

tom

bunvegni

lcd \

lcd c:\prive

cd \

cd c:\oracle\product\10.1.0\Db_1\BIN

get weg.log

quit

These command must be given in a FTP session that retrieves a file from a local server and stores it on a client machine.

This file can be saved and later used in a batch file that reads like:

ftp -s: [file name]

The batch can subsequently be scheduled.

The result is an automated file transfer, made possible without human intervention.

« Previous PageNext Page »