Trending September 2023 # Hive Partitions &Amp; Buckets With Example # Suggested October 2023 # Top 16 Popular | Lanphuongmhbrtower.com

Trending September 2023 # Hive Partitions &Amp; Buckets With Example # Suggested October 2023 # Top 16 Popular

You are reading the article Hive Partitions &Amp; Buckets With Example updated in September 2023 on the website Lanphuongmhbrtower.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested October 2023 Hive Partitions &Amp; Buckets With Example

Tables, Partitions, and Buckets are the parts of Hive data modeling.

What is Partitions?

Hive Partitions is a way to organizes tables into partitions by dividing tables into different parts based on partition keys.

Partition is helpful when the table has one or more Partition keys. Partition keys are basic elements for determining how the data is stored in the table.

For Example: –

“Client having Some E –commerce data which belongs to India operations in which each state (38 states) operations mentioned in as a whole. If we take state column as partition key and perform partitions on that India data as a whole, we can able to get Number of partitions (38 partitions) which is equal to number of states (38) present in India. Such that each state data can be viewed separately in partitions tables.

Sample Code Snippet for partitions

Creation of Table all states

create table all states(state string, District string,Enrolments string) row format delimited fields terminated by ',';

    Loading data into created table all states

    Load data local inpath '/home/hduser/Desktop/AllStates.csv' into table allstates;

      Creation of partition table

      create table state_part(District string,Enrolments string) PARTITIONED BY(state string);

        For partition we have to set this property set hive.exec.dynamic.partition.mode=nonstrict

        Loading data into partition table

        INSERT OVERWRITE TABLE state_part PARTITION(state) SELECT district,enrolments,state from allstates;

          Actual processing and formation of partition tables based on state as partition key

          There are going to be 38 partition outputs in HDFS storage with the file name as state name. We will check this in this step

          The following screen shots will show u the execution of above mentioned code

          From the above code, we do following things

          Creation of table all states with 3 column names such as state, district, and enrollment

          Loading data into table all states

          Creation of partition table with state as partition key

          In this step Setting partition mode as non-strict( This mode will activate dynamic partition mode)

          Loading data into partition tablestate_part

          Actual processing and formation of partition tables based on state as partition key

          There is going to 38 partition outputs in HDFS storage with the file name as state name. We will check this in this step. In This step, we seeing the 38 partition outputs in HDFS

          What is Buckets?

          Buckets in hive is used in segregating of hive table-data into multiple files or directories. it is used for efficient querying.

          The data i.e. present in that partitions can be divided further into Buckets

          The division is performed based on Hash of particular columns that we selected in the table.

          Buckets use some form of Hashing algorithm at back end to read each record and place it into buckets

          In Hive, we have to enable buckets by using the set.hive.enforce.bucketing=true;

          Step 1) Creating Bucket as shown below.

          From the above screen shot

          We are creating sample_bucket with column names such as first_name, job_id, department, salary and country

          We are creating 4 buckets overhere.

          Once the data get loaded it automatically, place the data into 4 buckets

          Step 2) Loading Data into table sample bucket

          Assuming that”Employees table” already created in Hive system. In this step, we will see the loading of Data from employees table into table sample bucket.

          Before we start moving employees data into buckets, make sure that it consist of column names such as first_name, job_id, department, salary and country.

          Here we are loading data into sample bucket from employees table.

          Step 3)Displaying 4 buckets that created in Step 1

          From the above screenshot, we can see that the data from the employees table is transferred into 4 buckets created in step 1.

          You're reading Hive Partitions &Amp; Buckets With Example

          Update the detailed information about Hive Partitions &Amp; Buckets With Example on the Lanphuongmhbrtower.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!