Search

OrientDB tips

Document (Object instance) = Record

Class (Object descriptor) = Table

Property (Object attribute) = Column

Cluster: even though by default one class = one cluster, a class can relies on multiple clusters. Why? This is because you can spawn records physically in multiple places.
Clusters are "containers" in which Classes (tables) organize documents (records) .
A class can have one or more cluster (one by default).

Types of clusters:
  • Physical: write on file system; 
  • Logic: use Phisical, but do not create files. It si slower than phisical;
  • Memory: all informations in this clusters are not permanent, and will be lost at the end of the process or  server. Memory cluster are usefull for applicative cache.

Select queries search in all clusters of a class by default.
Insert queries by default insert data in the default cluster.
We can refer to a specific cluster (so we can improve performance and launch parallel query):
select * from cluster:clusterName

Record id RID:
<cluster-id>:<cluster-position>
RID identifies uniquely a Document (a record) in the whole database (not just in a table). It is the physical position of the record inside the database.
  • cluster-id is the id of the cluster. Each database can have a maximum of 32,767 clusters (2^15)
  • cluster-position is the position of the record inside the cluster. Each cluster can handle up to 9,223,372,035,000,000,000 (2^63) records.
So the maximum size of a database is 2^78 records = 302,231,454,903,657,000,000,000 records.

Example of use in query:
load record #12:4

Amazing performance:
"THE JOIN IS THE EVIL"
Looking for an ID at runtime, every time you execute a query and foreach record it could be very very expensive!
The first optimization is using indexes. That's true, indexes speed up searches but slow down INSERT, UPDATE and DELETE operations.
Furthermore they occupy substantial space on disk and memory.

In OrientDB a RID (RecordID) is the physical position of the record inside the database. This means that loading a record by its RID it's blazing fast with response time close to be constant O(1) even if the database grows. This is huge in the BigData age!

No comments:

Post a Comment