반응형
1. 우분투 컨테이너 설치
docker run -i -t --name hadoop-base ubuntu
ctrl + P, Q 로 컨테이너 정지하지 않고 쉘 빠져나오기 가능
단, docker run -it 옵션인 경우에만 가능
2. open jdk 설치 (컨테이너)
add-apt-repository ppa:openjdk-r/ppa
apt-get update
apt-get install openjdk-8-jdk
java -version
3. 하둡 설치 (컨테이너)
apt-get install wget
cd ~
mkdir soft
cd soft/
mkdir apache
cd apache/
mkdir hadoop
cd hadoop/
wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
tar xvzf hadoop-2.7.7.tar.gz
4. bashrc 수정 (컨테이너)
apt-get install vim -y
vi ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/root/soft/apache/hadoop/hadoop-2.7.7
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source ~/.bashrc
5. 디렉토리 생성 및 하둡 설정파일 수정 (컨테이너)
cd $HADOOP_HOME/
mkdir tmp
mkdir namenode
mkdir datanode
cd $HADOOP_CONFIG_HOME/
cp mapred-site.xml.template mapred-site.xml
6. core, hdfs, mapred, yarn 파일 수정 (컨테이너)
hadoop-env.sh
# 추가
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.7.7/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
<final>true</final>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<final>true</final>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.7.7/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.7.7/datanode</value>
<final>true</final>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.adress</name>
<value>master:8088</value>
</property>
</configuration>
7. 네임노드 포맷 (컨테이너)
hadoop namenode -format
8. ssh 설정 (컨테이너)
apt-get install ssh
~/.bashrc 수정
#autorun
/usr/sbin/sshd
9. 컨테이너 이미지 commit (호스트)
docker start <container id>
docker commit -m "hadoop install in ubunu" <container id> ubuntu:hadoop
docker image ls
10. master, slave 생성 (호스트)
docker run -i -t -h master --name master -p 50070:50070 -p 8088:8088 ubuntu:hadoop
docker run -i -t -h slave1 --name slave1 --link master:master ubuntu:hadoop
docker run -i -t -h slave2 --name slave2 --link master:master ubuntu:hadoop
11. slave 컨테이너 ip 확인 (호스트)
docker inspect slave1 (172.17.0.4)
docker inspect slave2 (172.17.0.5)
12. 하둡 설정 및 구동 (컨테이너)
docker attach master (접속. 컨테이너는 이미 구동중)
vim /etc/hosts
172.17.0.3 master
172.17.0.4 slave1
172.17.0.5 slave2
vim $HADOOP_CONFIG_HOME/slaves
# 데이터노드가 될 컨테이너 목록들.
slave1
slave2
master
start-all.sh
hosts 파일의 내용은 컨테이너를 재시작하면 아래에서 기입한 내용은 없어진다. 재시작할때마다 다시 입력하든지, 컨테이너를 시작할 때 입력해주는 쉘 스크립트를 만들든지 하면 된다.
13. 워드카운트 테스트 (컨테이너)
cd ~
cd soft/apache/hadoop/hadoop-2.7.7
hadoop fs -mkdir /input
hadoop fs -put LICENSE.txt /input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input /output
hadoop fs -cat /output/*
14. Resource manager 웹 UI 확인
http://<host ip>:8088 접속
반응형
'기초 튼튼탄탄탄 > DevOps' 카테고리의 다른 글
[ansible] How to set environment variables (4) | 2020.04.18 |
---|---|
[ansible] fetch 모듈을 사용하여 파일 가져오기 (4) | 2020.04.17 |
[docker] 다른 host에서 도커 컨테이너로 접속하는 방법 (6) | 2020.01.12 |
VMware ESXi - VM 복제하는 방법 (4) | 2019.12.10 |
ansible-playbook 디렉토리 구조 (4) | 2019.12.05 |
댓글