MBAP logo

Managed Table VS External Table in Apache Hive (with Hands-on)

Adani Institute of Digital Technology Management (AIDTM)

By Dr. Virendra Kumar Shrivastava
Professor (Big Data Analytics) || Researcher || Blogger || Mentor || Adani Institute of Digital Technology Management || Adani Group

Hello everyone,

In this article I am going to explain the managed tables vs external tables in Apache Hive. This question is prominently featured in the interviews. So, if you want clarity on this aspect, this article is for you.

So, lets dive into deep in the discussion.

Apache Hive is a warehouse tool in distributed HDFS environment which is used to store huge dataset. You can use data ingestion tools to ingest dataset from variety of platform to Hive warehouse. Apache hive support two type of tables:

1.    Managed or Internal table

2.   External table

1. Managed or internal table:

When you create a table in Apache hive, by default it is treated as managed or internal table. If you want to create an external table, you will have to use “external” keyword explicitly. By default, Hive stores the managed table in the warehouse folder under hive. In case of managed table, schema (meta data) and the data included in the tables are managed by Hive.  You can perform HQL operation such as insert, update, and delete on the managed table. But when you drop managed table, the data along with schema will also get deleted.

For example, I have created a managed table iris_tab_managed in the students database. The see below given snippet: 

No alt text provided for this image

So, by default iris_tab_managed is created in the warehouse folder under Hive. I have upload data in the managed table (see below given snippet).

No alt text provided for this image

So, When I load data in the iris_tab_managed table it will be stored in warehouse folder under studets.db/iris_tab_managed location. The see below given snippet:

No alt text provided for this image

No alt text provided for this image

No alt text provided for this image

When I execute drop table iris_tab_managed command it will drop data as well as meta data of iris_tab_managed table. The see below given snippet.

No alt text provided for this image

2.External Table

You can create external table using external keyword and you can also specify location of the dataset in HDFS. See below given snippet.

No alt text provided for this image

To confirm whether a table is managed or external you can run following command

>describe formatted iris_tab_managed;

No alt text provided for this image

External tables can be stored at any location in the HDFS. Hive only manages meta data of the external table. In case of external table only schema is get deleted but data remains intact in the underlaying location.

No alt text provided for this image
If you do not specify location of data, then by default external table will be created in the warehouse folder under hive. See below snippet.
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
In case of External Tables, only the meta data of the table is managed by HIVE. The data present in these tables can sit on any storage locations in HDFS. You cannot perform update and delete operation on external tables. You can perform only select operation on external tables. When you drop external table, only the schema / table/ meta data gets deleted and not the data contained in the external table.

Conclusion:

From the above discussion, it is super simple to understand that managed table and external table is quite different and used for different purpose. In case of managed table, schema and data is managed by Hive whereas to create external table you need to use external keyword in create table statement and you can also specify location of the dataset. In external table, Hive only manage meta data and data can be stored at any location in HDFS. Managed tables follow ACID properties whereas you cannot perform update and delete operations on external tables. Insert operation is slow in managed operation whereas it is fast in external tables. So, I am sure this stuff will clear your doubts about managed or internal tables VS managed tabled in Apache Hadoop.

Adani Institute of

APPLICATION FORM 2022

Digital Technology Management

Adani Institute of

APPLICATION FORM 2022

Digital Technology Management

Reach Us

primary Menu

Inspiring Education… Assuring Success!!

Free Content !!!
Signup now to get the All recent CAT Exam Paper with Solution... Subscribe below with your Email ID
Free Content !!!
Signup now to get the All recent CAT Exam Paper with Solution... Subscribe below with your Email ID