Monthly Archive: 十二月 2014

hdfs 基本操作命令

官网文档:http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

hadoop fs –ls / 列出当前目录有哪些子目录,有哪些文件。

hadoop fs –mkidr /test 在Hadoop文件系统当中,创建一个test目录

hadoop fs –get /filename 从Hadoop文件系统当中,获取一个文件到本地的文件系统。

hadoop fs –put srcfile /desfile 从本地的文件系统上传一个文件到Hadoop文件系统中。

【翻译】Apache Hadoop MapReduce

原文:http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

综述

Hadoop MapReduce是一个便于开发并行处理海量数据(TB级)的应用的软件框架,该框架在由普通pc机组成的大规模集群(上千台节点)上实现了可靠性及容错。

一个MapReduce任务(job)通常会将输入数据集分片,这一工作是由map任务完全并行的完成的。框架整理map的运行结果,作为reduce任务的输入。通常数据的输入输出都是在文件系统上完成的。MapReduce框架负责调度、监控及重做失败任务的工作。

通常来讲计算节点和存储节点是一样的,也就是说,MapReduce框架及HDFS运行在同一个节点集合。这种配置使得框架可以在数据已就绪的节点集群内高效的调度任务,这样在集群内获得了 非常大的带宽。

MapReduce框架包含一个资源管理器(ResourceManager ),每个节点上的NodeManager及每个应用上的MRAppMaster。

应用至少要指定输入输出位置,并通过适当的接口实现及抽象类来提供map及reduce的功能。

Hadoop的job-client提交前述任务(这个任务可以是jar,也可以是其他可执行的文件),并配置到资源管理器。资源管理器将软件及配置分发给从机,调度并监控任务,向job-client提供状态及诊断信息。

尽管Hadoop框架是用Java实现的,MapReduce应用不限定使用Java编写。

Hadoop  Streaming使得用户可以创造及运行任意的可执行程序作为mapper或者reducer。

Hadoop Pipes是兼容SWIG的C++ API,用于实现MapReduce应用。

输入输出

MapReduce框架运行在键值对(<key, value>)上,也就是说,MapReduce框架将任务的输入视为一个键值对的集合,产生新的键值对集合作为任务输出。key及value的类需要能够被框架序列化,因此必须实现Hadoop的writable接口(org.apache.hadoop.io )。此外,key类需要实现WritableComparable接口(org.apache.hadoop.io)来促进框架的排序。

一个MapReduce任务的输入输出类型示例:

(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3> (output)

接下来玩例子:

MapReduce工作方式的小栗子:词频统计(wordcount)

 Java 源代码

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

复制上述代码到文件:

注意!文件夹及后面的jar需要让hdfs用户有7的权限!!!否则后面执行出错。

[root@test1 ~]# mkdir -p /tmp/Class4_1/
[root@test1 ~]# vim /tmp/Class4_1/WordCount.java

使用

假设环境变量设置如下(主要添加了后两条,不加会有classNotFound的错误)

export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar:$HADOOP_HOME

编译前述WordCount.java文件并生成jar。

cdh hadoop默认lib目录:

/var/lib/

[root@test1 ~]# hadoop com.sun.tools.javac.Main /tmp/Class4_1/WordCount.java
[root@test1 ~]# cd /tmp/Class4_1
[root@test1 Class4_1]# ls
WordCount.class WordCount.java
WordCount$IntSumReducer.class WordCount$TokenizerMapper.class
[root@test1 Class4_1]# jar cf wc.jar WordCount*.class
[root@test1 Class4_1]# ls
wc.jar WordCount$IntSumReducer.class WordCount$TokenizerMapper.class
WordCount.class WordCount.java

假设输入输出目录如下

/user/class_example/4_1/wordcount/input

/user/class_example/4_1/wordcount/output

[root@test1 ~]# hadoop fs -mkdir /user/class_example/
mkdir: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
[root@test1 ~]# su hdfs
bash-4.1$ hadoop fs -mkdir /user/class_example/
bash-4.1$ hadoop fs -mkdir /user/class_example/4_1/
bash-4.1$ hadoop fs -mkdir /user/class_example/4_1/wordcount/
bash-4.1$ hadoop fs -mkdir /user/class_example/4_1/wordcount/input/
bash-4.1$ hadoop fs -mkdir /user/class_example/4_1/wordcount/output/

 

在本地生成输入文件

[root@test1 ~]# vim /class_example/4_1/wordcount/input/file01
Hello World Bye World
[root@test1 ~]# vim /class_example/4_1/wordcount/input/file02
Hello Hadoop Goodbye Hadoop

导入到hdfs

bash-4.1$ hdfs dfs -put /class_example/4_1/wordcount/input/file01 /user/class_example/4_1/wordcount/input/file01 
bash-4.1$ hdfs dfs -put /class_example/4_1/wordcount/input/file02 /user/class_example/4_1/wordcount/input/file02

(在jar的目录)跑一把前面生成的MapReduce程序的jar

bash-4.1$ hadoop jar wc.jar WordCount /user/class_example/4_1/wordcount/input/ /user/class_example/4_1/wordcount/output/

15/01/15 19:45:07 INFO client.RMProxy: Connecting to ResourceManager at hdp01/172.19.17.231:8032
15/01/15 19:45:08 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/01/15 19:45:08 INFO input.FileInputFormat: Total input paths to process : 2
15/01/15 19:45:09 INFO mapreduce.JobSubmitter: number of splits:2
15/01/15 19:45:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1419153136605_0005
15/01/15 19:45:09 INFO impl.YarnClientImpl: Submitted application application_1419153136605_0005
15/01/15 19:45:09 INFO mapreduce.Job: The url to track the job: http://hdp01:8088/proxy/application_1419153136605_0005/
15/01/15 19:45:09 INFO mapreduce.Job: Running job: job_1419153136605_0005
15/01/15 19:45:23 INFO mapreduce.Job: Job job_1419153136605_0005 running in uber mode : false
15/01/15 19:45:23 INFO mapreduce.Job: map 0% reduce 0%
15/01/15 19:45:31 INFO mapreduce.Job: map 100% reduce 0%
15/01/15 19:45:42 INFO mapreduce.Job: map 100% reduce 15%
15/01/15 19:45:48 INFO mapreduce.Job: map 100% reduce 28%
15/01/15 19:45:49 INFO mapreduce.Job: map 100% reduce 31%
15/01/15 19:45:54 INFO mapreduce.Job: map 100% reduce 38%
15/01/15 19:45:55 INFO mapreduce.Job: map 100% reduce 43%
15/01/15 19:45:56 INFO mapreduce.Job: map 100% reduce 46%
15/01/15 19:46:01 INFO mapreduce.Job: map 100% reduce 53%
15/01/15 19:46:02 INFO mapreduce.Job: map 100% reduce 61%
15/01/15 19:46:07 INFO mapreduce.Job: map 100% reduce 64%
15/01/15 19:46:08 INFO mapreduce.Job: map 100% reduce 72%
15/01/15 19:46:09 INFO mapreduce.Job: map 100% reduce 75%
15/01/15 19:46:10 INFO mapreduce.Job: map 100% reduce 76%
15/01/15 19:46:13 INFO mapreduce.Job: map 100% reduce 78%
15/01/15 19:46:14 INFO mapreduce.Job: map 100% reduce 79%
15/01/15 19:46:15 INFO mapreduce.Job: map 100% reduce 88%
15/01/15 19:46:16 INFO mapreduce.Job: map 100% reduce 90%
15/01/15 19:46:17 INFO mapreduce.Job: map 100% reduce 92%
15/01/15 19:46:20 INFO mapreduce.Job: map 100% reduce 93%
15/01/15 19:46:21 INFO mapreduce.Job: map 100% reduce 96%
15/01/15 19:46:22 INFO mapreduce.Job: map 100% reduce 100%
15/01/15 19:46:24 INFO mapreduce.Job: Job job_1419153136605_0005 completed successfully
15/01/15 19:46:25 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=1513
 FILE: Number of bytes written=7837484
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=306
 HDFS: Number of bytes written=41
 HDFS: Number of read operations=222
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=144
 Job Counters 
 Launched map tasks=2
 Launched reduce tasks=72
 Data-local map tasks=2
 Total time spent by all maps in occupied slots (ms)=10984
 Total time spent by all reduces in occupied slots (ms)=388216
 Total time spent by all map tasks (ms)=10984
 Total time spent by all reduce tasks (ms)=388216
 Total vcore-seconds taken by all map tasks=10984
 Total vcore-seconds taken by all reduce tasks=388216
 Total megabyte-seconds taken by all map tasks=11247616
 Total megabyte-seconds taken by all reduce tasks=397533184
 Map-Reduce Framework
 Map input records=2
 Map output records=8
 Map output bytes=82
 Map output materialized bytes=2377
 Input split bytes=256
 Combine input records=8
 Combine output records=6
 Reduce input groups=5
 Reduce shuffle bytes=2377
 Reduce input records=6
 Reduce output records=5
 Spilled Records=12
 Shuffled Maps =144
 Failed Shuffles=0
 Merged Map outputs=144
 GC time elapsed (ms)=7568
 CPU time spent (ms)=113830
 Physical memory (bytes) snapshot=24281419776
 Virtual memory (bytes) snapshot=123326947328
 Total committed heap usage (bytes)=58622738432
 Shuffle Errors
 BAD_ID=0
 CONNECTION=0
 IO_ERROR=0
 WRONG_LENGTH=0
 WRONG_MAP=0
 WRONG_REDUCE=0
 File Input Format Counters 
 Bytes Read=50
 File Output Format Counters 
 Bytes Written=41

参观一下输出:

bash-4.1$ hdfs dfs -ls /user/class_example/4_1/wordcount/output
Found 73 items
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/_SUCCESS
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00000
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00001
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00002
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00003
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00004
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00005
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00006
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00007
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00008
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00009
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00010
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00011
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00012
-rw-r--r--   3 hdfs supergroup          6 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00013
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00014
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00015
-rw-r--r--   3 hdfs supergroup         10 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00016
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00017
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00018
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00019
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00020
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00021
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00022
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00023
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00024
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00025
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00026
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00027
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00028
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00029
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00030
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00031
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00032
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00033
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00034
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00035
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00036
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:45 /user/class_example/4_1/wordcount/output/part-r-00037
-rw-r--r--   3 hdfs supergroup          9 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00038
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00039
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00040
-rw-r--r--   3 hdfs supergroup          8 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00041
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00042
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00043
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00044
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00045
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00046
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00047
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00048
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00049
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00050
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00051
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00052
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00053
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00054
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00055
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00056
-rw-r--r--   3 hdfs supergroup          8 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00057
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00058
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00059
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00060
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00061
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00062
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00063
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00064
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00065
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00066
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00067
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00068
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00069
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00070
-rw-r--r--   3 hdfs supergroup          0 2015-01-15 19:46 /user/class_example/4_1/wordcount/output/part-r-00071

 

 

 

 

 

 

 

 

 

 

 

【翻译】Apache Hadoop 下一代MapReduce ——YARN

官网链接:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

在学习YARN之前先了解一下MapReduce:http://bananalighter.com/apache-hadoop-mapreduce/

MapReduce经历了hadoop-0.23版本的大规模修改,目前是MapReduce2.0(MRv2)或者叫做YARN。

MRv2的核心思想是将JobTracker的资源管理及任务的调度监控分解为多个不同的后台程序。这个思路是建立一个全局的资源管理器(ResourceManager),为每一个应用建立一个应用控制器(ApplicationMaster)。应用要么是一个传统的MapReduce任务,要么是一系列任务的有向无环图(DAG)。

资源管理器(RM)及每个节点上的节点管理器(NodeManager)来自于数据计算框架。资源管理器(RM)是系统中所有应用及其资源的最高级别的仲裁。

每个应用的应用管理员(ApplicationMaster)是一个框架的特定库,用于从资源管理器(RM)协调资源,并和节点管理器(NM)一起执行和监控任务。

YARN arch

 

资源管理器(RM)包含两个主要组成部分:调度器及应用管理器(ApplicationManager)。

调度器负责为不同的应用分配容量资源、排队等等。调度器单纯进行调度,不进行监控及状态跟踪。

翻译了一半发现董西城的blog上已经早都有了。。。

链过去好了= =:http://dongxicheng.org/mapreduce-nextgen/nextgen-mapreduce-introduction/

删除cloudera manager 及 cdh

边翻译边干:http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-0-0/Cloudera-Manager-Installation-Guide/cm5ig_uninstall_cm.html

配置情况

一台cm server(使用mysql)

四台cdh主机

默认的用户目录

以下目录是默认安装时的各个工具的目录:

/var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper /dfs /mapred /yarn

假如在安装时自己改过,那就在重新安装的时候注意删除自定义的目录。

停止各种cdh及cm的服务

可以在界面上操作

删除cm server上的服务及安装

检查cm server状态

[root@cdh_manager ~]# /opt/cm-5.1.1/etc/init.d/cloudera-scm-server status
cloudera-scm-server dead but pid file exists
[root@cdh_manager ~]# /opt/cm-5.1.1/etc/init.d/cloudera-scm-server stop
Stopping cloudera-scm-server: [FAILED]
[root@cdh_manager ~]# ps -ef | grep cm
root      8697  8603  0 11:09 pts/0    00:00:00 grep cm
root     21078     1  1 Dec09 ?        01:19:48 /usr/java/jdk1.7.0_67/bin/java -cp .:lib/*:/usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar -Dlog4j.configuration=file:/opt/cm-5.1.1/etc/cloudera-scm-server/log4j.properties -Dcmf.root.logger=INFO,LOGFILE -Dcmf.log.dir=/opt/cm-5.1.1/log/cloudera-scm-server -Dcmf.log.file=cloudera-scm-server.log -Dcmf.jetty.threshhold=WARN -Dcmf.schema.dir=/opt/cm-5.1.1/share/cmf/schema -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dpython.home=/opt/cm-5.1.1/share/cmf/python -Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp com.cloudera.server.cmf.Main
[root@cdh_manager ~]# kill 9 21078
[root@cdh_manager ~]# ps -ef | grep cm
root      8704  8603  0 11:09 pts/0    00:00:00 grep cm
[root@cdh_manager ~]# /opt/cm-5.1.1/etc/init.d/cloudera-scm-server status
cloudera-scm-server is stopped

[root@cdh_manager ~]# /opt/cm-5.1.1/etc/init.d/cloudera-scm-agent status
cloudera-scm-agent dead but pid file exists
[root@cdh_manager ~]# ps -ef | grep cm
root      8737  8603  0 11:12 pts/0    00:00:00 grep cm
[root@cdh_manager ~]# ps -ef | grep agent
root      8739  8603  0 11:12 pts/0    00:00:00 grep agent
[root@cdh_manager ~]# /opt/cm-5.1.1/etc/init.d/cloudera-scm-agent hard_stop
cloudera-scm-agent is already stopped
supervisord is already stopped
[root@cdh_manager ~]# /opt/cm-5.1.1/etc/init.d/cloudera-scm-agent status
cloudera-scm-agent is stopped

确认server及agent关闭后删除:

[root@cdh_manager ~]# rm -rf /opt/cm-5.1.1/

清除mysql数据库

 

 删除集群上的cm agent及各种cdh软件

关闭cm agent

[root@hdp01 ~]# service cloudera-scm-agent status
cloudera-scm-agent (pid  3969) is running...
[root@hdp01 ~]# service cloudera-scm-agent hard_stop
Stopping cloudera-scm-agent: [  OK  ]
supervisord is already stopped

删除agent

[root@hdp04 ~]# yum remove cloudera-manager-*
Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Setting up Remove Process
Resolving Dependencies
--> Running transaction check
---> Package cloudera-manager-agent.x86_64 0:5.1.1-1.cm511.p0.82.el6 will be erased
---> Package cloudera-manager-daemons.x86_64 0:5.1.1-1.cm511.p0.82.el6 will be erased
--> Finished Dependency Resolution
Repository cloudera-cdh5 is listed more than once in the configuration
Repository cloudera-manager is listed more than once in the configuration
cloudera-cdh5                                            | 2.9 kB     00:00     
cloudera-manager                                         | 2.9 kB     00:00     
rhel-iso                                                 | 3.9 kB     00:00     

Dependencies Resolved

================================================================================
 Package                 Arch   Version                 Repository         Size
================================================================================
Removing:
 cloudera-manager-agent  x86_64 5.1.1-1.cm511.p0.82.el6 @cloudera-manager  27 M
 cloudera-manager-daemons
                         x86_64 5.1.1-1.cm511.p0.82.el6 @cloudera-manager 500 M

Transaction Summary
================================================================================
Remove        2 Package(s)

Installed size: 527 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Erasing    : cloudera-manager-daemons-5.1.1-1.cm511.p0.82.el6.x86_64      1/2 
  Erasing    : cloudera-manager-agent-5.1.1-1.cm511.p0.82.el6.x86_64        2/2 
warning: /etc/cloudera-scm-agent/config.ini saved as /etc/cloudera-scm-agent/config.ini.rpmsave
  Verifying  : cloudera-manager-agent-5.1.1-1.cm511.p0.82.el6.x86_64        1/2 
  Verifying  : cloudera-manager-daemons-5.1.1-1.cm511.p0.82.el6.x86_64      2/2 

Removed:
  cloudera-manager-agent.x86_64 0:5.1.1-1.cm511.p0.82.el6                       
  cloudera-manager-daemons.x86_64 0:5.1.1-1.cm511.p0.82.el6                     

Complete!
[root@hdp04 ~]# yum clean all
Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Repository cloudera-cdh5 is listed more than once in the configuration
Repository cloudera-manager is listed more than once in the configuration
Cleaning repos: cloudera-cdh5 cloudera-manager rhel-iso
Cleaning up Everything

clean yum cache

[root@hdp04 ~]# yum clean all
Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Repository cloudera-cdh5 is listed more than once in the configuration
Repository cloudera-manager is listed more than once in the configuration
Cleaning repos: cloudera-cdh5 cloudera-manager rhel-iso
Cleaning up Everything

 删除cdh中的组件

删除cloudera manager的数据

[root@hdp02 ~]# rm -Rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera* /var/log/cloudera* /var/run/cloudera*
rm: cannot remove `/var/run/cloudera-scm-agent/process': Device or resource busy

重启后试试 物理机重启那叫一个慢。。。。。。。。

重启以后删掉了。

删除cloudera manager的lock文件

[root@hdp04 ~]#  rm /tmp/.scm_prepare_node.lock
rm: cannot remove `/tmp/.scm_prepare_node.lock': No such file or directory

竟然没有,可能是因为通过命令正常关闭的,本身没有产生lock。

再删除其他hadoop的组件

[root@hdp04 ~]# rm -Rf /var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper
[root@hdp04 ~]#
rm -Rf /dfs /mapred /yarn

*我没有删除/dfs,因为最初物理分区直接给分了个/dfs出来。

done

windows 修改远程桌面端口

修改远程桌面端口需要两个步骤:
1、打开注册表 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\Wds\rdpwd\Tds\tcp],修改右边PortNamber的值,其默认值是3389,修改成所希望的端口即可,例如3309
2、再打开注册表 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentContro1Set\Control\Tenninal Server\WinStations\RDP-Tcp],修改右边PortNamber的值,其默认值是3389,修改成所希望的端口即可,例如3309
修改完后需要重启生效,注意防火墙的问题!

参考:http://zhidao.baidu.com/link?url=sbMt1YylQyd6gHrTqaP0-Wj4LZ74SG_X_UhVLloFypdCZ96IUygLIWa-qpzLBC9dhNB1VAYx-oXjI4j0kBnhtK

 

 

使用CDH Manager(及本地源)自动化安装CDH 5

一、准备工作

共性准备配置

1.ntp服务器

ntpdate time-server-ip
#将时间写入bios
hwclock --systohc

2.关闭iptables及selinux

3.配置hosts文件(增加Manager机记录、增加所有slaver机记录)

slaver机准备

1.slaver机配置yum文件(添加cm、cdh的源,添加rhel光盘的源)

vim /etc/yum.repos.d/cloudera-cdh5-local.repo 

#内容如下
[cloudera-cdh5]
# Packages for Cloudera’s Distribution for Hadoop, Version 5, on RedHat or CentOS 6 x86_64
name=Cloudera’s Distribution for Hadoop, Version 5
baseurl=http://cdh_manager.ctbj/cdh5/
gpgcheck = 0
enabled = 1

[cloudera-manager]
name = Cloudera Manager, Version 5.1.1
baseurl = http://cdh_manager.ctbj/cm/
gpgcheck = 0
enabled=1

(添加光盘的源不再赘述)

manager机准备

1.安装并建立本地repo的http服务

yum install httpd
service httpd start
chkconfig httpd on
#将cm、cdh、redhat的本地源拷贝到/var/www/html
unzip -d /var/www/html cm.zip
……

2.http://bananalighter.blog.51cto.com/6386339/1546624

二、使用CM 安装CDH

登陆http://ip:7180

添加主机按提示操作(选择本地repo)。

参考:http://debugo.com/cm5-install/