Standalone 모드 설치에 이어 환경을 구성한다.
클러스터 내 노드간의 통신을 요구하기 떄문에, ssh의 설치가 필수이다.
> yum install openssh-server -y
> yum install openssh-clients -y
> yum install openssh-askpass -y
Plain Text
복사
ssh 서비스를 시작한다. Docker로 생성한 CentOS에선 service와 systemctl을 사용하기 어렵기 때문에 ps -ef로 실행중인 서비스를 확인할 수 있다.
> ps-ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:15 pts/0 00:00:00 /bin/bash
root 72 0 0 10:34 pts/1 00:00:00 /bin/bash
root 144 72 0 10:42 pts/1 00:00:00 ps -ef
Plain Text
복사
ssh를 수동으로 실행한다.
#최초 실행은 host key가 없기 때문에 오류가 난다.
> /usr/sbin/sshd
Could not load host key: /etc/ssh/ssh_host_rsa_key
Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Could not load host key: /etc/ssh/ssh_host_ed25519_key
sshd: no hostkeys available -- exiting.
> /usr/sbin/sshd-keygen -A
> /usr/sbin/sshd
Plain Text
복사
비밀번호 없이 ssh통신을 하기 위해 공개키-비밀키 쌍을 생성한다.
> ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Plain Text
복사
localhost에 연결하여 테스트 해본다.
> ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:CLFWxhxZVJwQSisSxUPZlEbBkkh+s/tMhpkDOMeQj/g.
ECDSA key fingerprint is MD5:36:67:25:f7:52:40:2e:ef:dd:3c:20:e1:62:29:a3:39.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Plain Text
복사
sshd 서버를 실행하기 위해 키들을 설정하고, bashrc에서 컨테이너에 로그인 될때마다 sshd를 실행하도록 설정해준다.
>ssh-keygen -f /etc/ssh/ssh_host_rsa_key -t rsa -N ""
>ssh-keygen -f /etc/ssh/ssh_host_ecdsa_key -t ecdsa -N ""
>ssh-keygen -f /etc/ssh/ssh_host_ed25519_key -t ed25519 -N ""
>vim ~/.bashrc
# ~/.bashrc 파일 내용들
...
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64
export PATH=$PATH:$JAVA_HOME/bin
export JAVA_OPTS="-Dfile.encoding=UTF-8"
export CLASSPATH="."
export HADOOP_HOME=/hadoop_home/hadoop-3.3.1
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
/usr/bin/sshd
> source ~/.bashrc
Shell
복사
다음으론 Hadoop을 설정한다.
# 하둡 설정 파일들이 있는 디렉토리로 이동
> cd $HADOOP_CONFIG_HOME
# hadoop-env.sh 열기
> vim hadoop-env.sh
# hadoop-env.sh
...
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.x86_64
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
Shell
복사
각 Daemon들이 홈으로 사용할 디렉토리를 생성한다.
> mkdir /opt/hadoop_home/temp
> mkdir /opt/hadoop_home/namenode
> mkdir /opt/hadoop_home/datanode
Plain Text
복사
다음의 파일들을 수정한다.
core-site.xml: HDFS와 MapReduce에서 공통적으로 사용할 환경정보
hdfs-site.xml: HDFS에서 사용할 환경정보
mapred-site.xml: MapReduce에서 사용할 환경정보
HADOOP_CONFIG_HOME 디렉토리의 각 파일을 다음과 같이 수정한다.
<!-- core-site.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop_home/temp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<final>true</final>
</property>
</configuration>
<!-- hdfs-site.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop_home/namenode_home</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop_home/datanode_home</value>
<final>true</final>
</property>
</configuration>
<!-- mapred-site.xml -->
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
XML
복사
하둡을 실행하기 전, namenode 포맷 후 컨테이너를 커밋한다.
> hadoop namenode -format
> Ctrl+d
> sudo docker commit hadoop-base centos:hadoop
> sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
centos hadoop 5dff566a550a 8 minutes ago 2.69GB
ubuntu latest d2e4e1f51132 2 weeks ago 77.8MB
centos centos7 eeb6ee3f44bd 8 months ago 204MB
Plain Text
복사
컨테이너에 다시 접속한 후, start-all.sh를 이용하여 클러스터를 실행시킨다.
> start-all.sh
> jps //jvm 위에서 실행중인 노드를 확인할 수 있다.
752 DataNode
977 SecondaryNameNode
2281 Jps
1436 NodeManager
1293 ResourceManager
622 NameNode
Plain Text
복사
테스트로 HADOOP_HOME 경로의 NOTICE.txt 파일을 WordCount를 한다.
> cd $HADOOP_HOME
> hadoop fs -mkdir /test
> hadoop fs -put NOTICE.txt /test
> hadoop fs -ls /test
Found 1 items
-rw-r--r-- 1 root supergroup 1541 2022-05-16 11:31 /test/NOTICE.txt
# 맵리듀스 wordcount 실행
> hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /test /test_out
> hadoop fs -ls /test_out
Found 2 items
-rw-r--r-- 1 root supergroup 0 2022-05-16 11:32 /test_out/_SUCCESS
-rw-r--r-- 1 root supergroup 1402 2022-05-16 11:32 /test_out/part-r-00000
> hadoop fs -cat /test_out/part-r-00000
(BIS), 1
(ECCN) 1
(TSU) 1
...
written 2
you 1
your 1
Plain Text
복사
생성한 이미지를 publish 옵션으로 특정 포트와 연결하면 Web UI로 모니터링할 수 있다. hadoop 3.x 기준으로 Web UI의 포트번호는 9870이다.
> sudo docker run -it --name hadoop-base -p 9870:9870 centos:hadoop
> start-all.sh
Plain Text
복사
localhost:9870