安装及配置Metastore:
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hive_metastore_configure.html
安装及配置HiveServer2:
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hiveserver2_configure.html
配置hive运行内存:
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hive_install.html
预安装环境参考:
yarn:http://bananalighter.com/cdh-yarn-installation/
zookeeper: http://bananalighter.com/cdh-install-zookeeper
物理环境:hadoop01、02、03三台机器。
1.Hive metastore server安装
(1) 安装hive包
选用hadoop01作为元数据服务所在机器,安装hive-metastore
yum install hive-metastore hive-server2
(2) 在hadoop02上安装metastore所用的mysql数据库
1 2 3 4 5 6 7 8 9 |
yum install mysql-server service mysqld start chkconfig mysqld on yum install mysql-connector-java ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
mysql初始化安装 mysql_secure_installation [...] Enter current password for root (enter for none): OK, successfully used password, moving on... [...] Set root password? [Y/n] y New password: Re-enter new password: Remove anonymous users? [Y/n] Y [...] Disallow root login remotely? [Y/n] N [...] Remove test database and access to it [Y/n] Y [...] Reload privilege tables now? [Y/n] Y All done! |
(3) 修改hive-site.xml参数
参照1-(1)表修改参数。
注意以下参数请根据自己实际情况填写,主机、账号及密码不要搞错。
javax.jdo.option.ConnectionURL | jdbc:mysql://hadoop01/metastore |
javax.jdo.option.ConnectionUserName | hive |
javax.jdo.option.ConnectionPassword | yourpassword |
hive.metastore.uris | thrift://hadoop01:9083 |
(4) 创建metastore所用的mysql实例及账号
数据库的示例脚本位置为:/usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.13.0.mysql.sql;
注意hive schema的版本应该与metastore一致,否则会报schema的错。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
$ mysql -u root -p Enter password: mysql> CREATE DATABASE metastore; mysql> USE metastore; mysql> SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.13.0.mysql.sql; mysql> CREATE USER 'hive'@'hadoop01' IDENTIFIED BY 'mypassword'; ... mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'hadoop01'; mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'hadoop01'; mysql> FLUSH PRIVILEGES; mysql> quit; |
2.安装Hive server2
(1) 安装hive包
为三台测试机hadoop01~03安装hive server2。
1 |
yum install hive-metastore hive-server2 |
(2) 配置到zookeeper的连接参数
参考表中的内容配置hadoop01。
与Hive server2有关的配置 | ||
hive.support.concurrency | true | Enable Hive’s Table Lock Manager Service |
hive.zookeeper.quorum | hadoop01,hadoop02,hadoop03 | Zookeeper quorum used by Hive’s Table Lock Manager |
hive.zookeeper.client.port | 2181 | The port at which the clients will connect. |
然后拷贝hive-site.xml文件到hadoop02、hadoop03的对应位置。
本步骤未正确配置会引发hive server无法获得锁的错误。
3.创建Hive在hdfs上的工作目录
在hdfs上创建 /user/hive/warehouse目录,并将权限修改为1777.
该目录为hive的默认工作目录。如果需要指定,则需要在hive-site.xml文件中配置参数:hive.metastore.warehouse.dir
4.启动hive metastore及hive server2
(1) 启动metastore service
在hadoop01上执行:
1 |
service hive-metastore start |
(本步骤需要正确部署zookeeper,zookeeper部署参考文初链接)
(2) 启动hive server2
在hadoop01-03上执行:
1 2 3 |
[root@hadoop01 ~]# service hive-server2 start Started Hive Server2 (hive-server2):[ OK ] |
(3) 检查日志
检查/var/log/hive目录下*.log文件内容有无异常、报错。有就搜索引擎解决吧。
5.使用配置
(1) reducer设置
参数:hive.exec.reducers.bytes.per.reducer
说明:每个reducer读取的字节数大小。假如输入是10G,本属性值为1G,则系统将会分配10个reducer。
修改方法:
编辑hive-site.xml,添加:
1 2 3 4 5 6 7 8 9 |
<property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>256MB</value> <description>Size per reducer.</description> </property> |
重启HiveServer2生效。
参数:mapred.reduce.tasks
说明:设置reducer任务数量。设置-1则系统自动根据情况
修改方法:
编辑hive-site.xml,添加:
1 2 3 4 5 6 7 |
<property> <name>mapred.reduce.tasks</name> <value>-1</value> </property> |
重启HiveServer2生效。
(2) HiveServer2内存
拷贝/etc/hive/conf/hive-env.sh.template 为/etc/hive/conf/hive-env.sh
下面的例子配置HiveServer2及Metastore的运行内存为2G,hive client的总内存为2G
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# Hive Client memory usage can be an issue if a large number of clients # are running at the same time. The flags below have been useful in # reducing memory usage: # if [ "$SERVICE" = "cli" ]; then if [ -z "$DEBUG" ]; then export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx2048m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit" else export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx2048m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" fi fi # The heap size of the jvm stared by hive shell script can be controlled via: # export HADOOP_HEAPSIZE=1024 # # Larger heap size may be required when running queries over large number of files or partitions. # By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be # appropriate for hive server (hwi etc). |
重启HiveServer2及Metastore生效。
(3) 关闭任务推测式执行
修改hive-site.xml文件,添加:
1 2 3 4 5 6 7 8 9 |
<property> <name>hive.mapred.reduce.tasks.speculative.execution</name> <value>false</value> <description>Whether speculative execution for reducers should be turned on.</description> </property> |
重启HiveServer2生效
修改mapred-site.xml文件,添加:
1 2 3 4 5 6 7 |
<property> <name>mapreduce.reduce.speculative</name> <value>false</value> </property> |
重启hadoop-yarn-resourcemanager生效
(4) cli设置
编辑hive-site.xml,添加:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<property> <name>hive.cli.print.header</name> <value>true</value> <description>Whether to print the names of the columns in query output.</description> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> <description>Whether to include the current database in the Hive prompt.</description> </property> |
展示效果:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
hive (lesson1)> select * from goods_price; OK goods_price.id goods_price.name goods_price.price goods_price.supplier goods_price.class_name 20151001 mengniu 2.0 yihaodian niunai 20151002 yili 2.5 yihaodian niunai 20151003 UHT 3.0 yihaodian niunai Time taken: 0.829 seconds, Fetched: 3 row(s) |