安装及配置Metastore：

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hive_metastore_configure.html

安装及配置HiveServer2：

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hiveserver2_configure.html

配置hive运行内存：

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hive_install.html

预安装环境参考：

yarn：http://bananalighter.com/cdh-yarn-installation/

zookeeper： http://bananalighter.com/cdh-install-zookeeper

物理环境：hadoop01、02、03三台机器。

1.Hive metastore server安装

(1) 安装hive包

选用hadoop01作为元数据服务所在机器，安装hive-metastore

yum install hive-metastore hive-server2

(2) 在hadoop02上安装metastore所用的mysql数据库

yum install mysql-server

service mysqld start

chkconfig mysqld on

yum install mysql-connector-java

ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar

yum install mysql-server

service mysqld start

chkconfig mysqld on

yum install mysql-connector-java

ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar

mysql初始化安装

mysql_secure_installation

[...]

Enter current password for root (enter for none):

OK, successfully used password, moving on...

[...]

Set root password? [Y/n] y

New password:

Re-enter new password:

Remove anonymous users? [Y/n] Y

[...]

Disallow root login remotely? [Y/n] N

[...]

Remove test database and access to it [Y/n] Y

[...]

Reload privilege tables now? [Y/n] Y

All done!

mysql初始化安装

mysql_secure_installation

[...]

Enter current password for root (enter for none):

OK, successfully used password, moving on...

[...]

Set root password? [Y/n] y

New password:

Re-enter new password:

Remove anonymous users? [Y/n] Y

[...]

Disallow root login remotely? [Y/n] N

[...]

Remove test database and access to it [Y/n] Y

[...]

Reload privilege tables now? [Y/n] Y

All done!

(3) 修改hive-site.xml参数

参照1-（1）表修改参数。

注意以下参数请根据自己实际情况填写，主机、账号及密码不要搞错。

javax.jdo.option.ConnectionURL	jdbc:mysql://hadoop01/metastore
javax.jdo.option.ConnectionUserName	hive
javax.jdo.option.ConnectionPassword	yourpassword
hive.metastore.uris	thrift://hadoop01:9083

(4) 创建metastore所用的mysql实例及账号

数据库的示例脚本位置为：/usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.13.0.mysql.sql;

注意hive schema的版本应该与metastore一致，否则会报schema的错。

$ mysql -u root -p

Enter password:

mysql> CREATE DATABASE metastore;

mysql> USE metastore;

mysql> SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.13.0.mysql.sql;

 

mysql> CREATE USER 'hive'@'hadoop01' IDENTIFIED BY 'mypassword';

...

mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'hadoop01';

mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'hadoop01';

mysql> FLUSH PRIVILEGES;

mysql> quit;

$ mysql -u root -p

Enter password:

mysql> CREATE DATABASE metastore;

mysql> USE metastore;

mysql> SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.13.0.mysql.sql;

mysql> CREATE USER 'hive'@'hadoop01' IDENTIFIED BY 'mypassword';

...

mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'hadoop01';

mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'hadoop01';

mysql> FLUSH PRIVILEGES;

mysql> quit;

2.安装Hive server2

(1) 安装hive包

为三台测试机hadoop01~03安装hive server2。

yum install hive-metastore hive-server2

1	yum install hive-metastore hive-server2

(2) 配置到zookeeper的连接参数

参考表中的内容配置hadoop01。

与Hive server2有关的配置
hive.support.concurrency	true	Enable Hive’s Table Lock Manager Service
hive.zookeeper.quorum	hadoop01,hadoop02,hadoop03	Zookeeper quorum used by Hive’s Table Lock Manager
hive.zookeeper.client.port	2181	The port at which the clients will connect.

然后拷贝hive-site.xml文件到hadoop02、hadoop03的对应位置。

本步骤未正确配置会引发hive server无法获得锁的错误。

3.创建Hive在hdfs上的工作目录

在hdfs上创建 /user/hive/warehouse目录，并将权限修改为1777.

该目录为hive的默认工作目录。如果需要指定，则需要在hive-site.xml文件中配置参数：hive.metastore.warehouse.dir

4.启动hive metastore及hive server2

(1) 启动metastore service

在hadoop01上执行：

service hive-metastore start

1	service hive-metastore start

（本步骤需要正确部署zookeeper，zookeeper部署参考文初链接）

(2) 启动hive server2

在hadoop01-03上执行：

[root@hadoop01 ~]# service hive-server2 start

Started Hive Server2 (hive-server2):[  OK  ]

[root@hadoop01 ~]# service hive-server2 start

Started Hive Server2 (hive-server2):[ OK ]

(3) 检查日志

检查/var/log/hive目录下*.log文件内容有无异常、报错。有就搜索引擎解决吧。

5.使用配置

(1) reducer设置

参数：hive.exec.reducers.bytes.per.reducer

说明：每个reducer读取的字节数大小。假如输入是10G，本属性值为1G，则系统将会分配10个reducer。

修改方法：

编辑hive-site.xml，添加：

<property>

  <name>hive.exec.reducers.bytes.per.reducer</name>

  <value>256MB</value>

  <description>Size per reducer.</description>

</property>

<name>hive.exec.reducers.bytes.per.reducer</name>

<description>Size per reducer.</description>

</property>

重启HiveServer2生效。

参数：mapred.reduce.tasks

说明：设置reducer任务数量。设置-1则系统自动根据情况

修改方法：

编辑hive-site.xml，添加：

<property>

<name>mapred.reduce.tasks</name>

<value>-1</value>

</property>

<name>mapred.reduce.tasks</name>

</property>

重启HiveServer2生效。

(2) HiveServer2内存

拷贝/etc/hive/conf/hive-env.sh.template 为/etc/hive/conf/hive-env.sh

下面的例子配置HiveServer2及Metastore的运行内存为2G，hive client的总内存为2G

# Hive Client memory usage can be an issue if a large number of clients

# are running at the same time. The flags below have been useful in

# reducing memory usage:

#

 if [ "$SERVICE" = "cli" ]; then

   if [ -z "$DEBUG" ]; then

     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx2048m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"

   else

     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx2048m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"

   fi

 fi
 

# The heap size of the jvm stared by hive shell script can be controlled via:

#

 export HADOOP_HEAPSIZE=1024

#

# Larger heap size may be required when running queries over large number of files or partitions.

# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be

# appropriate for hive server (hwi etc).

# Hive Client memory usage can be an issue if a large number of clients

# are running at the same time. The flags below have been useful in

# reducing memory usage:

if [ "$SERVICE" = "cli" ]; then

if [ -z "$DEBUG" ]; then

export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx2048m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"

else

export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx2048m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"

# The heap size of the jvm stared by hive shell script can be controlled via:

export HADOOP_HEAPSIZE=1024

# Larger heap size may be required when running queries over large number of files or partitions.

# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be

# appropriate for hive server (hwi etc).

重启HiveServer2及Metastore生效。

(3) 关闭任务推测式执行

修改hive-site.xml文件，添加：

<property>

  <name>hive.mapred.reduce.tasks.speculative.execution</name>

  <value>false</value>

  <description>Whether speculative execution for reducers should be turned on.</description>

</property>

<name>hive.mapred.reduce.tasks.speculative.execution</name>

<value>false</value>

<description>Whether speculative execution for reducers should be turned on.</description>

</property>

重启HiveServer2生效

修改mapred-site.xml文件，添加：

<property>

<name>mapreduce.reduce.speculative</name>

<value>false</value>

</property>

<name>mapreduce.reduce.speculative</name>

<value>false</value>

</property>

重启hadoop-yarn-resourcemanager生效

(4) cli设置

编辑hive-site.xml，添加：

<property>

  <name>hive.cli.print.header</name>

  <value>true</value>

  <description>Whether to print the names of the columns in query output.</description>

</property>

<property>

  <name>hive.cli.print.current.db</name>

  <value>true</value>

  <description>Whether to include the current database in the Hive prompt.</description>

</property>

<name>hive.cli.print.header</name>

<description>Whether to print the names of the columns in query output.</description>

</property>

<name>hive.cli.print.current.db</name>

<description>Whether to include the current database in the Hive prompt.</description>

</property>

展示效果：

hive (lesson1)> select * from goods_price;

OK

goods_price.id  goods_price.name        goods_price.price       goods_price.supplier    goods_price.class_name

20151001        mengniu 2.0     yihaodian       niunai

20151002        yili    2.5     yihaodian       niunai

20151003        UHT     3.0     yihaodian       niunai

Time taken: 0.829 seconds, Fetched: 3 row(s)

hive (lesson1)> select * from goods_price;

goods_price.id goods_price.name goods_price.price goods_price.supplier goods_price.class_name

20151001 mengniu 2.0 yihaodian niunai

20151002 yili 2.5 yihaodian niunai

20151003 UHT 3.0 yihaodian niunai

Time taken: 0.829 seconds, Fetched: 3 row(s)

【翻译】部署hive

1.Hive metastore server安装

(1) 安装hive包

(2) 在hadoop02上安装metastore所用的mysql数据库

(3) 修改hive-site.xml参数

(4) 创建metastore所用的mysql实例及账号

2.安装Hive server2

(1) 安装hive包

(2) 配置到zookeeper的连接参数

3.创建Hive在hdfs上的工作目录

4.启动hive metastore及hive server2

(1) 启动metastore service

(2) 启动hive server2

(3) 检查日志

5.使用配置

(1) reducer设置

参数：hive.exec.reducers.bytes.per.reducer

参数：mapred.reduce.tasks

(2) HiveServer2内存

(3) 关闭任务推测式执行

(4) cli设置

Leave a Comment 取消回复