Hive是Apache社区的一个顶级项目,由facebook贡献出来,hive的出现主要是解决海量的结构化日志的统计问题,因为使用传统的Hadoop MapReduce来开发业务逻辑繁琐,让开发人员很痛苦,周期比较长.而Hive可以将SQL转换成MR作业,减少了开发人员的任务.
Hive 刚开始时是作为Hadoop项目的一个子项目的,后面才单独成为一个项目
需要注意几点的是:
Hive是构建在Hadoop之上的数据仓库,适合处理离线数据.延时性高Hive是一个客户端,不是一个集群,可以将SQL提交到Hadoop集群执行Hive是一个类SQL的框架,查询语言是HQL HQL默认不支持行级别的增改删,需要手动配置Hive默认不支持事务,需要自己配置。支持分区存储。 Hive职责:SQL ==> MR/SparkHive底层支持的引擎:MR/Spark/TezSQL on Hadoop的几个框架:Spark SQL/Hive/Impala/Presto部署架构
Hive的数据分两类:
- 元数据,描述数据的数据,通常我们用mysql来存储 - 存储数据:存在HDFShive.metastore.warehouse.dir属性
hive的信息是可以配置在hive-site.xml里面 ,这里设置的属性是全局的
session范围的属性可以使用set命令进行设置
set 参数; 查看当前参数的值
set 参数=值; 设置参数为对应的值
也可以使用hiveconf
hive --hiveconf k=v --hiveconf k=v
Hive中的数据在HDFS上都是以文件夹/文件的方式存储的
创建数据库
标准语法 CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)]; # 创建数据库 # 不指定location,默认路径/user/hive/warehouse/hive_create_test.db CREATE DATABASE IF NOT EXISTS hive_create_test COMMENT '测试hive' ; # 指定location 路径/db/hive/test # 可以看到hive的database就是一个一个文件夹 CREATE DATABASE IF NOT EXISTS hive_create_test2 COMMENT "测试hive' " LOCATION "/db/hive/test"; show databases; desc desc database hive_create_test2; # 删除数据库后还是有/db/hive目录 drop database hive_create_test2; use hive_create_test;创建表
标准语法 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [COMMENT col_comment], ... [constraint_specification])] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables) CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; data_type : primitive_type | array_type | map_type | struct_type | union_type -- (Note: Available in Hive 0.7.0 and later) primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later) | STRING | BINARY -- (Note: Available in Hive 0.8.0 and later) | TIMESTAMP -- (Note: Available in Hive 0.8.0 and later) | DECIMAL -- (Note: Available in Hive 0.11.0 and later) | DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later) | DATE -- (Note: Available in Hive 0.12.0 and later) | VARCHAR -- (Note: Available in Hive 0.12.0 and later) | CHAR -- (Note: Available in Hive 0.13.0 and later) array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> union_type : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later) row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later) | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | JSONFILE -- (Note: Available in Hive 4.0.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname constraint_specification: : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ] [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE # 创建表,设置数据记录以,分割 CREATE TABLE IF NOT EXISTS hive_hello(id int,data string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; sudo vi /opt/test.log 9,hello 29,world 77,hive #导入数据 load data local inpath '/opt/test.log' OVERWRITE INTO TABLE hive_hello # 查询数据 select * from hive_hello;Reference
Hive DDL操作官网链接