Hive基础-1

it2022-05-05  151

Hive基础-1

Hive是Apache社区的一个顶级项目,由facebook贡献出来,hive的出现主要是解决海量的结构化日志的统计问题,因为使用传统的Hadoop MapReduce来开发业务逻辑繁琐,让开发人员很痛苦,周期比较长.而Hive可以将SQL转换成MR作业,减少了开发人员的任务.

Hive 刚开始时是作为Hadoop项目的一个子项目的,后面才单独成为一个项目


需要注意几点的是:

Hive是构建在Hadoop之上的数据仓库,适合处理离线数据.延时性高Hive是一个客户端,不是一个集群,可以将SQL提交到Hadoop集群执行Hive是一个类SQL的框架,查询语言是HQL HQL默认不支持行级别的增改删,需要手动配置Hive默认不支持事务,需要自己配置。支持分区存储。 Hive职责:SQL ==> MR/SparkHive底层支持的引擎:MR/Spark/TezSQL on Hadoop的几个框架:Spark SQL/Hive/Impala/Presto

部署架构

Hive的数据分两类:

- 元数据,描述数据的数据,通常我们用mysql来存储 - 存储数据:存在HDFS
默认的Hive数据存放在HDFS上:/user/hive/warehouse,如果需要改动,需要在hive-site.xml配置

hive.metastore.warehouse.dir属性

hive的信息是可以配置在hive-site.xml里面 ,这里设置的属性是全局的

session范围的属性可以使用set命令进行设置

set 参数; 查看当前参数的值

set 参数=值; 设置参数为对应的值

也可以使用hiveconf

hive --hiveconf k=v --hiveconf k=v

Hive中的数据在HDFS上都是以文件夹/文件的方式存储的


1. Hive安装

wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz tar -zxvf ./hive-1.1.0-cdh5.7.0.tar.gz sudo ln -s /opt/hive-1.1.0-cdh5.7.0 /hive chown -R hadoop:hadoop /hive chmod -R 777 /hive/* sudo vi /etc/profile export HIVE_HOME=/hive export PATH=$PATH:$HIVE_HOME/bin source /etc/profile cd /hive cp ./hive-log4j.properties.template hive-log4j.properties cp ./hive-env.sh.template hive-env.sh

2.配置

wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.37/mysql-connector-java-5.1.37.jar mv ./mysql-connector-java-5.1.37.jar /hive/lib chown -R hadoop:hadoop /hive/lib chmod -R 664 /hive/lib/mysql* vi ./hive-site.xml <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.metastore.local</name> <value>true</value> <description> Use false if a production metastore server is used. </description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop001:3307/test?createDatabaseIfNotExist=true</value> <description> The JDBC connection URL. </description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description> Driver class name for a JDBC metastore </description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root@1234%</value> </property> </configuration>

3.启动

start-dfs.sh start-yarn.sh mysqld_multi start hive

4.基本操作

创建数据库

标准语法 CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)]; # 创建数据库 # 不指定location,默认路径/user/hive/warehouse/hive_create_test.db CREATE DATABASE IF NOT EXISTS hive_create_test COMMENT '测试hive' ; # 指定location 路径/db/hive/test # 可以看到hive的database就是一个一个文件夹 CREATE DATABASE IF NOT EXISTS hive_create_test2 COMMENT "测试hive' " LOCATION "/db/hive/test"; show databases; desc desc database hive_create_test2; # 删除数据库后还是有/db/hive目录 drop database hive_create_test2; use hive_create_test;

创建表

标准语法 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [COMMENT col_comment], ... [constraint_specification])] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables) CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; data_type : primitive_type | array_type | map_type | struct_type | union_type -- (Note: Available in Hive 0.7.0 and later) primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later) | STRING | BINARY -- (Note: Available in Hive 0.8.0 and later) | TIMESTAMP -- (Note: Available in Hive 0.8.0 and later) | DECIMAL -- (Note: Available in Hive 0.11.0 and later) | DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later) | DATE -- (Note: Available in Hive 0.12.0 and later) | VARCHAR -- (Note: Available in Hive 0.12.0 and later) | CHAR -- (Note: Available in Hive 0.13.0 and later) array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> union_type : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later) row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later) | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | JSONFILE -- (Note: Available in Hive 4.0.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname constraint_specification: : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ] [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE # 创建表,设置数据记录以,分割 CREATE TABLE IF NOT EXISTS hive_hello(id int,data string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; sudo vi /opt/test.log 9,hello 29,world 77,hive #导入数据 load data local inpath '/opt/test.log' OVERWRITE INTO TABLE hive_hello # 查询数据 select * from hive_hello;

Reference

Hive DDL操作官网链接


最新回复(0)