|
Managing images in a database |
|
|
|
|
Sunday, 01 June 2008 |
Preamble The relational databases require the existence of relations between the data contained in tables, for example, to conduct joint or filters. The performance of a database is then no common measure with a developer could be using the resources of a programming language and simple data files. In contrast, storing data filtering or join can only take place directly in SQL falls heavily performance RDBMS and is a nonsense ... Given the volume of images, the base is so polluted with a quantity of significant information which only a small portion is actually used by the SQL engine. Go one day you ask your engine SQL RDBMS complaint: Image SELECT FROM WHERE TableImage Image contains a yellow dog on a swing ' ? Of course not! So it is pointless to play a store images in a database, especially as the manipulation of BLOBS (specialized columns for this type of data) can not mostly be SQL basic ... We must not forget either that what is the best a computer, it is precisely the handling of files ... So there is everything to gain by using a storage in the form of files in the OS rather than manipulate flows in binary columns BLOB. 1. Study the characteristics of images However, the images possess characteristics (known as attributes when models) that it is often utilse collected in a database. Here are some of the main characteristics of images: | FORMAT (CHAR 4) | BMP, JPEG, TIFF, GIF ... | | COLORS (INTEGER) | 2 (black and white), 256 (greyscale), 1024 (colors) ... | | WIDTH (INTEGER) | the width of the image | | ARRIVALS (INTEGER) | the height of the image | | NAME (VARCHAR 256) | filename | | SIZE (INTEGER) | the volume of data | | DESCRIPTION (VARCHAR 1024) | a description of the image may be a request LIKE | | CAPTION (VARCHAR 256) | legend registering under the image | | HINT (VARCHAR 256) | information appearing in mouse over | | COPYRIGHT (VARCHAR 256) | author / owner of the image | | DATE CREATION | creation date | Of course you can add various features to meet its own needs ... This approach is often called technical meta data, because it does not work directly on the data, but on the characteristics, the parameters of the data. 2. Store images You can store all its images in a single directory of a file server. Simply knowing the path of this directory which can be detailed. Call it "PATH_IMAGE." Therefore, to find the location of a picture, just the constant concatenate PATH_IMAGE on behalf of the image. Example: PATH_IMAGE: = '\ \ SRV_files \ images \' BITMAP_FILE: = '\ \ SRV_files \ images \' + NAME and voila! But this simplistic technique has a major drawback: there can be too many files in the directory ... Let us not forget that operating systems have limits and even if they are very broad, it is never impossible to achieve. Another solution is to mutiplier the number of directories and store images, are by their dates of insertion, either by volume or a predetermined number. | By date | for example per month + years. Example: IMG_2001_01 for images inserted in January IMG_2001_02 for images inserted in February ... | | By number | For example enlimitant to 256 images IMG_001 contains the first images included 256 IMG_001 contains images included the 257eme the 512eme in order of their inclusion ... | | By volume | for example by limiting the volume of images to 600 MB IMG_01 contains as many pictures as possible within the limit of 600 MB IMG_02 contains a maximum volume of 600 MB of images inserted in the wake of filling IMG_01 ... | For my part I choose in general engineering volume, limiting it to 600 MB Why? Because it is a 600 MB CD Rom and that I am therefore very easy for archiving burn a CD of images by directory! Therefore the problem of knowing where each image is located. Just add the features that are already specified, the path storage of each image, whose path clear reference can be stored in another table. 3. An initial data model Here is a first conceptual model of data:  You will notice that I added a reference table for the format images (T_TYPE_IMAGE_TIM), it allows me to create preventive all formats with which I want to work, that would be for checks and data validation. I must, however, a global variable departure, the point of entry into the file system (the "path") from which I can create my hives. This can be done in an INI file, in a tool specific to the system (the Windows registry for example, but then more portability to another OS) or even (and this is what I prefer) in a table Setting my database ... 4. The management of obsolescence An interesting thing is that it can manage obsolescense images. In other words manage an image that is no longer used, or even replaced by another. For suppression, nothing could be easier just to create an additional column in the table T_IMAGE_IMG to put the date of removal. AInsi any request to retrieve an image filter on this date and that the system. This has the advantage of allowing to return web pages obsolete (archives, for example) with images of origin rather than with "holes". Remember also that in general we avoid on databases to delete the information. For replacement, simply connect to the table T_IMAGE itself to inform that the image referred was replaced by a new one. A simple reference self sufficient in general. We must therefore ensure that this column is NULL in search queries, and failing that, seek the replacement image. Here is a more comprehensive organization of our management systems meta data from images:  And a physical model equivalent SQL 2:  It should be noted that self reference on the table of images to manage obsolescence was defined by a column name IMG_ID_REMPLACEMENT.
5. Script SQL complete Here is a complete SQL script for the creation of a tiller base and its repository of data: -- CREATION OF TABLES -- ============================================================
create table T_PARAM_BASE_PRB ( PRB_ID INTEGER not null, PRB_NOM CHAR(256) not null, PRB VARCHAR(256) not null, primary key (PRB_ID) ); -- ============================================================ -- Index : T_PARAM_BASE_PRB_PK -- ============================================================ create unique index T_PARAM_BASE_PRB_PK on T_PARAM_BASE_PRB (PRB_ID asc); -- ============================================================ -- Table : T_TYPE_IMAGE_TIM -- ============================================================ create table T_TYPE_IMAGE_TIM ( TIM_ID INTEGER not null, TIM_FORMAT CHAR(4) not null, TIM_LIBELLE VARCHAR(32) not null, primary key (TIM_ID) ); -- ============================================================ -- Index : T_TYPE_IMAGE_TIM_PK -- ============================================================ create unique index T_TYPE_IMAGE_TIM_PK on T_TYPE_IMAGE_TIM (TIM_ID asc); -- ============================================================ -- Table : T_STOCKAGE_IMAGE_SIM -- ============================================================ create table T_STOCKAGE_IMAGE_SIM ( SIM_ID INTEGER not null, SIM_DATE_CREATION DATE not null, SIM_PATH VARCHAR(1024) not null, primary key (SIM_ID) ); -- ============================================================ -- Index : T_STOCKAGE_IMAGE_SIM_PK -- ============================================================ create unique index T_STOCKAGE_IMAGE_SIM_PK on T_STOCKAGE_IMAGE_SIM (SIM_ID asc); -- ============================================================ -- Table : T_IMAGE_IMG -- ============================================================ create table T_IMAGE_IMG ( IMG_ID INTEGER not null, TIM_ID INTEGER not null, IMG_ID_REMPLACEMENT INTEGER , IMG_NOM_FICHIER VARCHAR(256) not null, IMG_DATE_CREATION DATE , IMG_COULEURS INTEGER , IMG_LARGEUR INTEGER , IMG_HAUTEUR INTEGER , IMG_TAILLE INTEGER , IMG_DESCRIPTION VARCHAR(1024) , IMG_LEGENDE VARCHAR(256) , IMG_HINT VARCHAR(256) , IMG_COPYRIGHT VARCHAR(256) , IMG_DATE_SUPPRESSION DATE , SIM_ID INTEGER not null, primary key (IMG_ID) ); -- ============================================================ -- Index : T_IMAGE_IMG_PK -- ============================================================ create unique index T_IMAGE_IMG_PK on T_IMAGE_IMG (IMG_ID asc); -- ============================================================ -- Index : T_IMAGE_NOM_UNI -- ============================================================ create unique index T_IMAGE_NOM_UNI on T_IMAGE_IMG (IMG_NOM_FICHIER asc); -- ============================================================ -- Index : TJ_IMGTIM_FK -- ============================================================ create index TJ_IMGTIM_FK on T_IMAGE_IMG (TIM_ID asc); -- ============================================================ -- Index : TJ_IMGIMG_FK -- ============================================================ create index TJ_IMGIMG_FK on T_IMAGE_IMG (IMG_ID_REMPLACEMENT asc); -- ============================================================ -- Index : TJ_IMGSIM_FK -- ============================================================ create index TJ_IMGSIM_FK on T_IMAGE_IMG (SIM_ID asc); alter table T_IMAGE_IMG add foreign key (TIM_ID) references T_TYPE_IMAGE_TIM (TIM_ID); alter table T_IMAGE_IMG add foreign key (IMG_ID_REMPLACEMENT) references T_IMAGE_IMG (IMG_ID); alter table T_IMAGE_IMG add foreign key (SIM_ID) references T_STOCKAGE_IMAGE_SIM (SIM_ID); -- Insertion of file formats image predefined INSERT INTO T_TYPE_IMAGE_TIM (TIM_ID, TIM_FORMAT, TIM_LIBELLE) VALUES (1, 'BMP ', 'Microsoft bitmap') INSERT INTO T_TYPE_IMAGE_TIM (TIM_ID, TIM_FORMAT, TIM_LIBELLE) VALUES (2, 'GIF ', 'Compuserve GIF') INSERT INTO T_TYPE_IMAGE_TIM (TIM_ID, TIM_FORMAT, TIM_LIBELLE) VALUES (3, 'JPG ', 'Internet compressé') INSERT INTO T_TYPE_IMAGE_TIM (TIM_ID, TIM_FORMAT, TIM_LIBELLE) VALUES (4, 'JPEG ', 'Internet compressé') You may have noticed that I surreptitiously added a single index on the column IMG_NOM_FICHIER to avoid trying to insert two image files bearing the same name! 6. Going further Why not manage access rights to images using this technique of meta data? Simple, just use a table of users (or create one) with access rights (reading pare example) and create a join table between the table of images and users. There are still many other possibilities of operating such a model and the technical data. I leave it to your sagacity care to supplement depending on your need. Of course I would be concerned that you do share your experiences ... In fact, such a model can also be used to manage files of sound or video ... And why not enriched images, as bitmap vector for interactive mapping, for example? 7. Discussion on the solution ... About: "Should we insert images directly in a column blob of a table?" A discussion took place on the Internet in the forum devoted to SQL Server ... Here are the key | David: | There are situations where it appears to be an excellent idea. Our application (medical), used (among others) by the eight largest hospitals in Norway, stores images associated with patients in a DB SQL Server. We chose to use a dedicated server (tied) save images, separate server operating containing data patients themselves and we would not reverse (back to a file system) under any pretext. The management of backups is uniform, the insertion of a image in a patient record can be an integral part of a transaction (the solution with file system obliging to use a mechanical much more complex, based MTS, for example), writing the application customer is simplified ... | | Med: | Looking back on that architecture, What is the thing you would have done differently? | | David: | In fact, image management has come very late in the development cycle of our application. There were although the possibility of integrating elements of multi-media (rx, protocols dictated), but compared to the total mass Data was rather marginal. The situation has really changed the day hospitals have requested, in addition to the usual integration images (RX, scans etc.), scan their documents and add them to folder. The proportion data / blobs changed radically. It is now in a situation where Images are sometimes partially stored in the DB Operating (images historically created before that could not take care of the scanning) and partly on a dedicated server. It's been a little disorder. However, it would be relatively easy to remedy that. | | Med: | What is the functionality that you would find desired in SQL / Server and t'as missed? What we have the most trouble is having to the integration of images on file in two stages: 1) bii use to fill a temporary table with the blob; and information about the patient as it relates 2) initiate a process using the content of this table temporary and "integrating" the image file | | David: | It is a pity that the inclusion of an image must be done by an external tool (bii) not always very well supported (the documentation in the BOL is laughable). It would have been nice to have integrated tools, SPs, whatever ... allowing "play" with BLOBs easily from ISQL. | | Med: | On this subject, I was frankly impressed by the solution IBM offered with DB2 DB2 offers a solution that combines the best of both worlds The advantage of files managed by the file system of the OS The advantage of data managed by a relational engine To implement this very simple solution, add a DB2 new type of data Such data, DATALINK, does nothing more than reference URL of a file But with all the advantages of one type of data classique a RDBMS ntegrity referential Nobody can destroy or rename the file as it is appears in the database Control of access The leaves of RDBMS applies to the file system! Backup and Restore The backup is done by the RDBMS so classic You save your base and files are saved Transaction Management Changes to the core and system files are made in the same transaction The good news is that IBM has submitted this new type of data ANSI which has adopted for SQL3 I think the future lies in such a solution and that j'espere we will soon in future versions of SQL / Server For more details do a search on DB2 DATALINK | | Fred (SQLpro): | Regarding the solution iclure images in the database, behold, for my part, the remarks that I bring about: drawbacks: Backup partial selective difficult if not impossible. Portability diminished. Recovering some of the images impossible if défailanece Server data. Recovery of images in the request difficult to impossible (BDE Borland makes it possible, but not HBO nor Microsoft ADO) Advantage: uniformity of the solution | | Denis | I myself worked as a project manager on a project imaging Medical for a very large private hospital (250 beds). (4 months of study, 14 months of development) The medical images (in color for some) are enormous (typically 7 to 10 MB / image)! There was no question of the store in the database: too load server response time client-side far too long. The solution is the same as you proposed. Some have reacted to your post on the "security". What's more that confidential medical images? Well it was a breeze to establish an appropriate strategy. At the time, it was on an NT server different from the base. 1 PDC NT 4 1 Server NT 4 domain member with SQL Server 7 1 Server NT 4 domain member in RAID 5 hot-plug with bay available for additional disks to accommodate more storage images + backup the name of each image was simply no record of patients 6 positions followed by a serial number on 2 positions. Networks Cat 5: 100 Mb liaison with inter-building fiber and switch to L0 each floor (3 in total) all on 4 buildings. Of course, the cost of this solution was very high. Nothing that hard, has exceeded 150000 Euros. In my opinion, if I had today to make a similar project, I referai unhesitatingly the same thing. While j'adapterai look hard and development based on a need real score but I retain the outsourcing of images. Those who do not, because too often they do not know well the possibilities offered by Windows or Novell. |
|
|
|