본문 바로가기
기초 튼튼탄탄탄/DevOps

docker로 hadoop 클러스터 (우분투 기반) 구축하기

by 잇서니 2020. 4. 10.
반응형

1. 우분투 컨테이너 설치

docker run -i -t --name hadoop-base ubuntu 

ctrl + P, Q 로 컨테이너 정지하지 않고 쉘 빠져나오기 가능
단, docker run -it 옵션인 경우에만 가능


2. open jdk 설치 (컨테이너) 

add-apt-repository ppa:openjdk-r/ppa 
apt-get update  
apt-get install openjdk-8-jdk 
java -version 


3. 하둡 설치 (컨테이너)

apt-get install wget 
cd ~ 
mkdir soft 
cd soft/ 
mkdir apache 
cd apache/ 
mkdir hadoop 
cd hadoop/ 
wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz 
tar xvzf hadoop-2.7.7.tar.gz 


4. bashrc 수정 (컨테이너)

apt-get install vim -y 
vi ~/.bashrc 
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 
export HADOOP_HOME=/root/soft/apache/hadoop/hadoop-2.7.7 
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop 
export PATH=$PATH:$HADOOP_HOME/bin 
export PATH=$PATH:$HADOOP_HOME/sbin 
source ~/.bashrc 


5. 디렉토리 생성 및 하둡 설정파일 수정 (컨테이너)

cd $HADOOP_HOME/ 
mkdir tmp 
mkdir namenode 
mkdir datanode 
cd $HADOOP_CONFIG_HOME/ 
cp mapred-site.xml.template mapred-site.xml 


6. core, hdfs, mapred, yarn 파일 수정 (컨테이너)

 

hadoop-env.sh

# 추가
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

core-site.xml

<configuration>
    <property>
            <name>hadoop.tmp.dir</name>
            <value>/root/soft/apache/hadoop/hadoop-2.7.7/tmp</value>
            <description>A base for other temporary directories.</description>
    </property>

    <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>
            <final>true</final>
            <description>The name of the default file system.  A URI whose
            scheme and authority determine the FileSystem implementation.  The
            uri's scheme determines the config property (fs.SCHEME.impl) naming
            the FileSystem implementation class.  The uri's authority is used to
            determine the host, port, etc. for a filesystem.</description>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
        <final>true</final>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/root/soft/apache/hadoop/hadoop-2.7.7/namenode</value>
        <final>true</final>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/root/soft/apache/hadoop/hadoop-2.7.7/datanode</value>
        <final>true</final>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>

        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>

        <name>mapred.job.tracker</name>
        <value>master:9001</value>
        <description>The host and port that the MapReduce job tracker runs
        at.  If "local", then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.adress</name>
    <value>master:8088</value>
  </property>
</configuration>

 



7. 네임노드 포맷 (컨테이너)

hadoop namenode -format 


8. ssh 설정 (컨테이너)

apt-get install ssh 


~/.bashrc 수정

#autorun  
/usr/sbin/sshd

 


9. 컨테이너 이미지 commit (호스트)

docker start <container id>
docker commit -m "hadoop install in ubunu" <container id> ubuntu:hadoop 
docker image ls 


10. master, slave 생성 (호스트)

docker run -i -t -h master --name master -p 50070:50070 -p 8088:8088 ubuntu:hadoop 
docker run -i -t -h slave1 --name slave1 --link master:master ubuntu:hadoop 
docker run -i -t -h slave2 --name slave2 --link master:master ubuntu:hadoop 


11. slave 컨테이너 ip 확인 (호스트)

docker inspect slave1 (172.17.0.4) 
docker inspect slave2 (172.17.0.5) 


12. 하둡 설정 및 구동 (컨테이너)

docker attach master (접속. 컨테이너는 이미 구동중) 

vim /etc/hosts 
172.17.0.3		master
172.17.0.4      slave1 
172.17.0.5      slave2 
vim $HADOOP_CONFIG_HOME/slaves 

# 데이터노드가 될 컨테이너 목록들.  
slave1 
slave2 
master 

start-all.sh 

hosts 파일의 내용은 컨테이너를 재시작하면 아래에서 기입한 내용은 없어진다. 재시작할때마다 다시 입력하든지, 컨테이너를 시작할 때 입력해주는 쉘 스크립트를 만들든지 하면 된다.


13. 워드카운트 테스트 (컨테이너)

cd ~ 
cd soft/apache/hadoop/hadoop-2.7.7 
hadoop fs -mkdir /input 
hadoop fs -put LICENSE.txt /input 
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input /output 
hadoop fs -cat /output/*

 

14. Resource manager 웹 UI 확인

http://<host ip>:8088 접속

반응형

댓글