1.常用函数
hive (default)> select ename,sal,
> case
> when sal>1 and sal<=1000 then "lower"
> when sal>1000 and sal<=2000 then "just so so"
> when sal>2000 and sal<=4000 then "ok"
> else "high"
> end
> from emp03;
OK
ename sal _c2
'SMITH' 800.0 lower
'ALLEN' 1600.0 just so so
'WARD' 1250.0 just so so
'JONES' 2975.0 ok
'MARTIN' 1250.0 just so so
'BLAKE' 2850.0 ok
'CLARK' 2450.0 ok
'SCOTT' 3000.0 ok
'KING' 5000.0 high
'TURNER' 1500.0 just so so
'ADAMS' 1100.0 just so so
'JAMES' 950.0 lower
'FORD' 3000.0 ok
'MILLER' 1300.0 just so so
Time taken: 0.04 seconds, Fetched: 14 row(s)
2.4个by
order by 全局排序,只会开启一个reduce,如果数据量过大,任务会很慢
hive (default)> select * from emp03 order by empno;
Query ID = hadoop_20190718155454_48a2d16a-3b26-416c-99f1-e2a7873d34ec
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1563262646611_0003, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0003/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 只开启一个reduce
2019-07-18 15:55:00,270 Stage-1 map = 0%, reduce = 0%
2019-07-18 15:55:05,599 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.38 sec
2019-07-18 15:55:12,982 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.21 sec
MapReduce Total cumulative CPU time: 3 seconds 210 msec
Ended Job = job_1563262646611_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.21 sec HDFS Read: 9779 HDFS Write: 761 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 210 msec
OK
emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
7369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
7499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
7521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
7566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
7654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
7698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
7782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
7788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
7839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
7844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
7876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
7900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
7902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
7934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
Time taken: 20.565 seconds, Fetched: 14 row(s)
sort by 局部排序,每个reduce内是有序的
如果数据量太少,展示不出效果,可以修改这个变量 set mapred.reduce.tasks=3;
hive (default)> select * from emp03 sort by empno desc;
Query ID = hadoop_20190718155858_755df3d6-36ed-4a40-a606-8732740c369e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1563262646611_0005, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0005/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3 开启了3个reduce
2019-07-18 15:58:39,028 Stage-1 map = 0%, reduce = 0%
2019-07-18 15:58:45,302 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.97 sec
2019-07-18 15:58:56,033 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4.17 sec
2019-07-18 15:58:58,140 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.99 sec
MapReduce Total cumulative CPU time: 7 seconds 990 msec
Ended Job = job_1563262646611_0005
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 7.99 sec HDFS Read: 19199 HDFS Write: 761 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 990 msec
OK
emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
7844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
7839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
7788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
7782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
7698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
7654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
第1个reduce
7934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
7900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
7876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
7566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
7521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
7499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
第2个reduce
7902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
7369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
第3个reduce
Time taken: 27.88 seconds, Fetched: 14 row(s)
distribute by 按照一定的规则把数据分散到某个reducer,不属于排序
hive (default)> select * from emp03 distribute by length(ename);
Query ID = hadoop_20190718162626_69db6f08-aafe-4195-9083-318899345aa4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1563262646611_0015, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0015/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2019-07-18 16:26:32,526 Stage-1 map = 0%, reduce = 0%
2019-07-18 16:26:37,798 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.81 sec
2019-07-18 16:26:48,575 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.39 sec
2019-07-18 16:26:49,650 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 5.2 sec
2019-07-18 16:26:50,692 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.06 sec
MapReduce Total cumulative CPU time: 7 seconds 60 msec
Ended Job = job_1563262646611_0015
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 7.06 sec HDFS Read: 19770 HDFS Write: 815 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 60 msec
OK
emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
7902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
7839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
第1个reduce
7521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
7788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
7900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
7876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
7782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
7698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
7566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
7499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
7369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
第2个reduce
7934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
7844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
7654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
第3个reduce
Time taken: 25.021 seconds, Fetched: 14 row(s)
length()为内置函数
内置函数查询方法:
show functions;
desc function length;
desc function extended length;
cluster by = distribute by xxx sort by xxx
hive (default)> select * from emp03 cluster by ename;;
Query ID = hadoop_20190718163333_d8eacf37-94ff-44a3-baeb-a06ae3733d3d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1563262646611_0016, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0016/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2019-07-18 16:33:23,244 Stage-1 map = 0%, reduce = 0%
2019-07-18 16:33:28,522 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.44 sec
2019-07-18 16:33:38,282 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.02 sec
2019-07-18 16:33:39,357 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 4.88 sec
2019-07-18 16:33:40,394 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.84 sec
MapReduce Total cumulative CPU time: 6 seconds 840 msec
Ended Job = job_1563262646611_0016
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 6.84 sec HDFS Read: 19344 HDFS Write: 815 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 840 msec
OK
emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
7876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
7499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
7839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
7534 'MANAGER' 'CLERK' 7582 '1982-02-23' 1300.0 NULL 20
7788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
第1个reduce
7782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
7900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
7566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
7654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
7934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
7369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
7844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
第2个reduce
7698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
7902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
7521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
第3个reduce
Time taken: 24.887 seconds, Fetched: 15 row(s)