Thursday 30 November 2023

Managing Large tables with partitioning

 when monitoring the system,you can sometimes see tables that have grown so large in SAP HANA that it makes sense to split them "horizontally'' into smaller partitions.By default,you can partition column tables in SAP HANA ,which is ideal for managing high data volumns. The SQL statements - or any data manipulation language(DML) statements - do not need to know that the data is partitioned.Instead,SAP HANA automatically manages the partitions behind the scenes. Thus,SAP HANA simplifies access and frontend development and gives administrators a key tool to manage disks, memory, and large column stores.


In a distributed (scale-out) SAP HANA system, you can also place the partitions on different nodes and thereby increase performance even more because more processors are available for available for your users. in fact, this may become the standard deployment method for extremely large systems with tens of thousands of users.

Currently,SAP HANA suppots up to 2 billion rows in a single column table.in a partitioned schema, you can now have 2 billion rows per partition with virtually no limit on how many partitions you can add.you are limited, not by a database limitation, but by hardware and landscape architecture issues. you can create partitions in SAP HANA in three different ways from an adninistration standpoint: with a range,with a hash and by round-robin method. while more complex schemas are possible with higher- level options. Lets take a look at these fundamental partitioning choices.


Partitioning column tables using a Range

if you know your data really well, you can partition the data by any range in your table.Although the most common is date, you can also partition by material numbers, postal codes, customer numbers, and so on.

Partitioning by date makes sense if you want to increase query speed and keep current data on a single node. Partitioning by customer number makes sense if you are trying to increase the speed of delta merges because multiple nodes can be used at the same time during data loads. you'll have to spend some time thinking of what benefitsyou want to achieve before undertaking any partitioning scheme.Note that the maintenance of range partitions is somewhat higher than the other options because you have to keep adding new partitions as data outside the existing partitions emerge (e.g., next years data if you partition by year now).


partitioning is done using SQL with the syntax seen in listing.
------------------------------------------------------------------------------------------------------------------------
CREATE COLUMN TABLE SALES (sales_order INT ,  customer_number INT , quantity INT , PRIMARY KEY (sales_order))
PARTITION BY RANGE ( sales_order)
(PARTITION 1 <= values < 100000000,
PARTITION 100000000 <= values < 200000000,

PARTITION OTHERS)
--------------------------------------------------------------------------------------------------------------------------




This syntax creates a table with three partitions. The first two have 100 million rows each, and the last partition has all the other records. However, you must follow some basic rules: first, the field you are partitioning on has to be part of the primary key (i.e., sales_order). Second, the field has to be defined as a string, a date, or an integer.Finally, you can only partition column stores, not row stores by ranges.


Partitioning column tables using a Hash:

Unlike partitioning by ranges, partitioning column stores by a hash doesn't require any in-depth knowledge of the data. Instead, partitions are created by an internal algorithm applied to one or more fields in the database by the systemitself. This algorithm is known as a hash. Records are then assigned
to the required partitions based on this internal hash number. The partitions are created in SQL with the following syntax:

--------------------------------------------------------------------------------------------------------------------

CREATE COLUMN TABLE SALES ( sales_order INT , cutomer_number INT , quantity INT , PRIMARY KEY ( sales_order, customer_number)) PARTITION BY HASH (sales_order, customer_number)
PARTITIONS 6

---------------------------------------------------------------------------------------------------------------------

This example creates six partitions by sales orders and customer numbers. some rules apply here as well: if the table has a primary key, the primary key  must be included in the hash. if you add more than one column, and your table has a primary key, all fields used to partition must be part of the primary key also. if you leave off the number 6, the system will determine the optional number of partitions itself based on your configuration. Therefore, using PARTITIONS without a number is the recommended setting for most hash partitions.


Partitioning column Tables using the Round-Robin method:


In a round-robin partition, the system assigns records to partitions on a rotating basis. The round-robin method makes efficient assignments and requires no knowledge of the data. However, removing partitions in the future will be more difficult because both new and old data will be in the same partitions.


The Partitions are created in SQL with the following syntax:

-----------------------------------------------------------------------------------------------------------------------

CREATE COLUMN TABLE SALES (sales_order INT , customer_number INT , quantity INT)
PARTITION BY ROUNDROBIN
PARTITIONS 6


------------------------------------------------------------------------------------------------------------------------

This example creates six partitions and assigns records on a rotating basis. if you change the last statement to PARTITIONS GET_NUM_SERVERS( ), the system will assign the optimal number of partitions based on your system landscape.The only requirement for the round-robin method is that the table cannot contain a primary key.



No comments:

Post a Comment