Optimization of productivity of databases for Web

To receive from a database with an open code a maximum of productivity, a profound knowledge are necessary.


Hardly there will be a web-application which would not lean{base} on this or that database. If you have no the sufficient budget or simply are the adherent of software products with an open code, that, most likely, will develop the applications on the basis of the hypertext processor php (php hypertext processor) and any database with an open code. In that case you should familiarize with the methods, allowing to squeeze out of databases everything, on what they are capable. In this clause{article} we consider{examine} some techniques of increase of productivity which will approach practically for any database with an open code.


optimizacija at a level of a database


The fastest way of increase of productivity of a program code of a database is replacement of the operators built - in him  sql on stored{kept} procedures.


Use of stored{kept} procedures


Stored{Kept} procedures are the subroutines contained in a database. Such procedures are preliminary compiled by the processor of a database and essentially raise productivity of last, excepting repeated references of the causing application (as a rule, it it is page php) to a nucleus of a database.


Besides stored{kept} procedures are much easier for supporting and serving. All logic is placed in one place of a code so all changes too are concentrated in one place and start to operate at once after their performance. When the program code always addresses to the same procedure, it is much easier to make sure that the business - logic is in full conformity with all functional requirements. If your procedures are universal enough, you can use them and in other projects, having reduced thus terms of development.


Besides in comparison with performance of a sql-code in such environments as php, asp (active server page) and jsp (java server pages), and also in environments of some other languages of development stored{kept} procedures allow to lower intensity of the network traffic appreciablly. And at last, if you will have necessity for scaling the application, it is much easier to expand his  code on some applied servers when the most part of logic of access to a database is stored{kept} and carried out within the framework of the database.


Storage of multimedia files in file system


Multimedia files, whether it be static images, sound files or films, often consider{examine} as binary objects. For them even there is a special term: the big binary objects - blob (binary large object). Fields blob can be stored{kept} either in a database, or in file system. In the latter case ways to objects blob are stored{kept} in a database. Storage of objects blob in file system will demand from you hardly more job, but will allow to achieve much more high efficiency, than at their storage in a database.


With increase in number of saved binary objects productivity of the processor of a database quickly decreases. Besides removal{distance} of such objects can lead to to education in files of a database of the big number of " dead zones ". When the information completely passes through the processor of a database, it is more difficult to him to support multitasking of job. Storage of objects blob in file system, on the contrary, facilitates creation of links to objects loaded from web-pages. After loading the information the web-server serves the reference{manipulation} to a file, and the processor of a database is engaged at this time in other problems{tasks}. A side benefit also is also that the manager can easily catalogue and administer the multimedia files which have been written down on a disk, and also do{make} their backup copies.


Use of indexation


Indexing - one of the most correct ways of escalating of productivity of a database. Besides he is included into number of the basic mechanisms of a database to which it is usually given unduly not enough attention. As a rule, lines of a database are stored{kept} in that order in what are created. Extraction from recording a database of some any size needs consecutive scanning corresponding lines of a database. The index creates separate set of the lines ordered according to the chosen index and containing indexes for initial lines. The indexed database is looked through much faster, than not indexed tables. However indexing "eats" additional disk space. Besides on updating of the indexed table it is required to more time as all used indexes too should be corrected.


Use of integer key fields


Probably, you have temptation at creation of tables to do without an integer key field. For example, in the table of the recordings containing the information on the personnel, you could use symbolical fields last_name and first_name, and also fields for the address and the contact information, at association of recordings, their viewing and other operations with them to use names of fields. We do not advise you to adhere to such practice. Use a numerical key field - for example, person_id better. If your data do not contain such field, you should create an autoincremental field which will not contain any real data and will carry out only a role of a key field.


Numerical fields have set of advantages. The probability of erroneous use of number is smaller probabilities of erroneous use of a name. At change of a name of the person (for example, in view of the introduction into a marriage{spoilage}) you will not need to change in the code all links to him . Besides association of the recordings having a numerical key field, is carried out much more effectively, than association of recordings with a text key field. It is necessary to take for practice to create a numerical field of a primary key at formation of each new table


Optimization of an applied code


To achieve the maximal increase of productivity of a database, it is possible to use some different strategy of optimization of a program code of the application. Below we result{bring} a number{line} of recommendations on optimization of a code which are applicable{applied} for any language of development of web-applications.


Use of sessions of communication{connection}


Sessions of communication{connection} are supported by several environments of development of web-applications. Usually the sessions of communication{connection} extremely popular among suppliers of applied service and programmers php, are realized by means of transfer of markers cookies. As web is not the environment{Wednesday} using the information on a status, she does not give programmers of any information that the user could do{make} before "has gone" on the given page. With the help of sessions of communication{connection} the programmer can trace process of navigation of the user. As a by-effect many programmers aspire to save the information on him in variables of a session. Though preservation of links to a database, such as connections or sets of recordings, in a variable of a session is business tempting, is rather bad habit. Storing of connections in a course of sessions of communication{connection} interferes with their association in a pool. Besides such practice results in squandering memory and computing capacity CPU.


If sessions of communication{connection} do not come to the end correctly, connections continue to exist during all time-out. Duration of a time-out can vary, but in most cases she more than 20 minutes. During all this time operative memory and computing capacity CPU are spent empty. Though can seem, that opening and the subsequent closing of connection on web-page results in waste of resources, in practice it promotes their effective preservation. It is enough to adhere to a "iron" rule only: to create connection as it is possible later and to close it  as soon as possible. The same rule is applicable{applied} and to sets of recordings.


Use of optimum searches


The way of use of operators sql can affect productivity of your web-application essentially. In particular, extraction from a database of the long list of recordings for display to one continuous web-page is irrational. You should take for once only a part of recordings (we shall say, 10 or 50) and to display the following group of recordings, to use the button " the Following 50 " (next 50) or the link to web-page.


At a spelling of such code, try to use to the full all advantages of language sql. For example, the keyword limit (limiting number) limits number of returned recordings. The keyword offset (displacement) passes{misses} the certain number of recordings, returning the recordings following them. To return third group of the user recordings in quantity{amount} of 50 pieces, you should use search of the following kind:


select customer_id, customer_name from customer order by customer_id limit 50 offset 100.


If tables contain the big number stolbcov or you use in searches of operation of connection, it is not necessary to use the operator select * only to protect itself from necessity to print names of fields. Printing of the names of fields necessary to you allows to save some cycles CPU at each start of a code.


So far as we have started talking about the operator select, we shall note, that is necessary to use his  expression where with the maximal advantage{benefit}. If the section where contains numbers{rooms} several stolbcov, productivity will depend on in what sequence these numbers{rooms} are written down. On the first place of expression where there should be number{room} of the column returning the minimal set of recordings, on the second - number{room} of the column returning the following minimal set of recordings, and so on for all rest stolbcov.


Use of the operator of a choice


For formation of the search demanding decision-making on the basis of result of his  performance, you can use one of two ways. The most obvious, but also slower way - to execute search and to check up his  result with the help of the applied code. More qualified and more favourable way - to use the advanced functions of language sql. The operator case language sql, as well as similar operators of the majority of other languages of development, can make a choice on the basis of value of entrance parameter. In that case the result of the operator select can be supervised, using the operator case as it is made in the following example:



select product_name,

case

when price <5 then ' cheap '

when price> 5 and price <20 then ' ok '

else ' too expensive for my taste '

end as product_price from products order by product_name;


In result we shall have the set of recordings consisting from two stolbcov:

The first column contains the name of the product, the second - selected us interpretation of the price.



***

mysql


By virtue of initial characteristics and histories of development of a database mysql are inherent in her some specific problems of productivity. For example, the best productivity mysql can be achieved by machines intel, working under the control of OS linux. To this there are many reasons, but main from them consist in a way of distribution of operative memory of system. Thus, if you have chosen mysql do not use microsoft windows nt and on the contrary.


There is an opinion, that mysql - extremely high-speed database. Many even assert{approve}, that on speed she considerably surpasses any other database from available in sale. The company tcx, actively advancing mysql on the market, has organized the web-site (http://www.tcx.com) on which compares with her  to other databases and results results of comparison of its{her} productivity for different program platforms. According to these results productivity of the processor of a database mysql surpasses productivity of all other processors of databases on the average on 40 %. And still let's look at things from the point of view of healthy scientific scepticism.


The matter is that such high efficiency mysql is given to us at all by gift and because she does not support transaction. Transactions appreciablly reduce productivity of the processor of a database. Besides they demand support of the journal files, allowing to carry out a stage-by-stage cancellation of the changes done at transactions, or completely to cancel the last. The system administrator should "be nursed" patiently with a file of registration and watch{keep up} that that he has not reached{achieved} too sizable. Besides, together with reserve copying of a database it is necessary to carry out and reserve copying of journal files.


Restrictions


Restrictions are used for a compulsory establishment of interrelations between tables and provide integrity of their data. The basic advantage of restrictions will be, that they protect from possible{probable} consequences of many mistakes of programming. In mysql restrictions have not found some wide use. And to you we too advise to do without them whenever possible. It is much more reasonable to achieve from applied logic that she completely realized all functional requirements and provided a coordination of all data. Then you should not live under fear of that in one perfect day your application "will fail" on you all weight only that one of restrictions of your code has been incorrectly established and appeared broken.


Some programmers spend more time for entering into a code of restriction, and then to debug them and to struggle with the problems caused by their infringement, than on development and optimization of logic of the programs. Besides if you do not use restriction, you have more than chances that the application developed by you can be transferred on other platforms.


The types of tables used in mysql


In mysql it is used four types of tables: static, dynamic, heap ("heap") and compressed. According to the operation manual mysql, static tables fastest of three types of the tables placed on disks. But they cannot contain stolbcy variable length. If in the table there is even one such column, mysql is compelled to create the dynamic table instead of static. Dynamic tables contain much more the data especially concerning the size of the table, but much more slowly static tables.


Other two tabulared types are special. Tables such as heap exist only in operative memory and consequently they can be counted extremely high-speed. But tables of this type are only small or medium-sized. The compressed tables are intended only for reading and too very high-speed. In the operation manual mysql (http://www.mysql.com/doc) you will find the additional information concerning all tabulared types.


Dead space mysql


When the data stolbcov variable length change, and the new data have shorter length, in information files mysql the "dead" space is formed. The program decision which would allow to cope with this problem, does not exist. Other problem of productivity is connected to usual use of indexes because of gradual degradation of the last. In ON mysql there is a tool means myisamchk, allowing to liberate dead space and pereoptimizirovat` indexes. This program should be started periodically in relation to a database to hold last under the control.


postgresql


As against mysql the database postgresql supports stored{kept} procedures, restrictions and transactions. Developers of this database have not gone on a way of reduction of its{her} functionalities for the sake of increase of productivity. But the rich set of functions postgresql has the advantages. In postgresql there are two original built - in functions which can be used for increase of its{her} productivity, - vacuum commands and explain.


When postgresql modifies a line, she saves an initial line, and at the end of the internal file of the data creates new. The old line is marked as out-of-date and used by other transactions which still use the previous status of a database existing before the current transaction has been showed{presented}. The same process takes place and at removal{distance} of lines. Vacuum command deletes out-of-date lines from a database and condenses her . That " to keep a database clean ", this command should be started periodically.


The important method of debugging of a code is optimization of a way of performance of search. If, trying to identify bottlenecks of the search made by you, you will look through simply visually it  very soon will understand, what is it in anything you will not result. To exclude such primitive approach, the majority of databases are equipped with function of the analysis which does not carry out search, and only analyzes it  for you. In postgresql such function is called explain. an example of its{her} use:


explain select phone_number from people where id=930;

notice: query plan: seq scan on people (cost=0.00.. 30.50 rows=20 width=15)


The database informs, that consecutive scanning is required.


Numerical values of cost (cost) are resulted only for comparison. The size rows represents prospective number of lines which should be returned search. Last size - width is a width of a line in bats. If you start vacuum command in relation to this database, that, most likely, will notice improvement of productivity, predskazyvaemoe function explain. Moreover, having created an index for a column phone_number, you will speed up performance of search, having replaced consecutive scanning by scanning on an index.