Docker手工安装Hadoop集群
安装Hadoop集群一般来讲比较困难,我们会采用CDH安装等集成环境,不过在安装这些集成环境时,比较臃肿,安装也很困难。我们尝试使用docker进行安装集群,从0开始,根据业务要求进行定制。
只要你认真细致,实际上安装hadoop集群也不是很难哦。
准备docker环境
在这个dockerfile里面,我们先安装jdk1.8,免费后面要继续安装
同事,生成秘钥文件,为了将来机器之间免密访问
# 生成的新镜像以centos镜像为基础
FROM centos
# 指定作者信息
MAINTAINER by Rudolfyan
# 安装openssh-server
RUN yum -y install openssh-server
RUN mkdir /var/run/sshd
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
# 指定root密码
RUN /bin/echo 'root:123456'|chpasswd
RUN /bin/sed -i 's/.*session.*required.*pam_loginuid.so.*/session optional pam_loginuid.so/g' /etc/pam.d/sshd
RUN /bin/echo -e "LANG=\"en_US.UTF-8\"" > /etc/default/local
RUN yum -y install java-1.8.0-openjdk.x86_64
EXPOSE 22
CMD /usr/sbin/sshd -D
准备hadoop环境
我们下载hadoop3.2.1版本,通过URL直接下载,放到和dockerfile相同层次的目录下。
[root@ora-mssql hadoop]# curl https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz -o hadoop-3.2.1.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 342M 100 342M 0 0 2583k 0 0:02:15 0:02:15 --:--:-- 4076k
[root@ora-mssql hadoop]# ls
Dockerfile hadoop-3.2.1.tar.gz
生成第二个镜像,基础镜像为第一节生成的镜像。
mkdir hadoopqun
cat <<EOF >hadoopqun/Dockerfile
FROM rudolfyan/centosssh:1.0
MAINTAINER will
ENV REFRESHED_AT 2021
ADD hadoop-3.2.1.tar.gz /usr/local/nlp/
ENV HADOOP_HOME /usr/local/nlp/hadoop-3.2.1
ENV PATH $HADOOP_HOME/bin:$PATH
RUN yum install -y which sudo
EOF
docker build -t rudolfyan/hadoopqun:1.0 . -f hadoopqun/Dockerfile
启动三个hadoop qun的docker
docker run --name dkhmaster -p 10022:22 -d rudolfyan/hadoopqun:1.0
docker run --name dkhslave1 -p 10022:23 -d rudolfyan/hadoopqun:1.0
docker run --name dkhslave2 -p 10022:24 -d rudolfyan/hadoopqun:1.0
[root@ora-mssql ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f92ba0ee1a4 rudolfyan/hadoopqun:1.0 "/bin/sh -c '/usr/..." About an hour ago Up About an hour 0.0.0.0:10024->22/tcp dkhslave2
bfd5a858efb1 rudolfyan/hadoopqun:1.0 "/bin/sh -c '/usr/..." About an hour ago Up About an hour 0.0.0.0:10023->22/tcp dkhslave1
5e35e92b76f0 rudolfyan/hadoopqun:1.0 "/bin/sh -c '/usr/..." About an hour ago Up About an hour 0.0.0.0:10022->22/tcp
在master的机器上生成秘钥,这样可以访问其他的slave机器免密,然后能够直接控制,这一步是比不可少的。
root@5e35e92b76f0 /]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:EBkfY93CuuzdHLl6T50vFPTkS0SReirUWg/f+S0kpUA root@5e35e92b76f0
The key's randomart image is:
+---[RSA 3072]----+
| oo+o . .oo|
| .+ oE . .o.|
| . .o ...o+ |
| .. .. =ooo|
| .S...o+B.+|
| o o=.o*o|
| . . o.*..+|
| . . =.o +|
| .o ..o.|
+----[SHA256]-----+
其他二台机器执行同样的操作,然后把这三个公钥合并到一个文件当中,再复制到三台机器,达到免密的效果。
具体操作见下
[root@ora-mssql ~]# docker cp dkhmaster:/root/.ssh/id_rsa.pub master1.key
[root@ora-mssql ~]# docker cp dkhslave1:/root/.ssh/id_rsa.pub slave1.key
[root@ora-mssql ~]# docker cp dkhslave2:/root/.ssh/id_rsa.pub slave2.key
[root@ora-mssql ~]# cat master1.key slave1.key slave2.key > authorized_keys
[root@ora-mssql ~]# docker cp authorized_keys dkhmaster:/root/.ssh/authorized_keys
[root@ora-mssql ~]# docker cp authorized_keys dkhslave1:/root/.ssh/authorized_keys
[root@ora-mssql ~]# docker cp authorized_keys dkhslave2:/root/.ssh/authorized_keys
将三台容器的IP地址的到,并将这些IP写到三台容器的/etc/hosts文件当中去
[root@ora-mssql ~]# docker inspect dkhmaster|grep IPA
"SecondaryIPAddresses": null,
"IPAddress": "172.17.0.3",
"IPAMConfig": null,
"IPAddress": "172.17.0.3",
[root@ora-mssql ~]# docker inspect dkhslave1|grep IPA
"SecondaryIPAddresses": null,
"IPAddress": "172.17.0.4",
"IPAMConfig": null,
"IPAddress": "172.17.0.4",
[root@ora-mssql ~]# docker inspect dkhslave2|grep IPA
"SecondaryIPAddresses": null,
"IPAddress": "172.17.0.5",
"IPAMConfig": null,
"IPAddress": "172.17.0.5",
写入文件并保存到docker的./etc/hosts,看起来如下:
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 5e35e92b76f0
172.17.0.3 master
172.17.0.4 slave1
172.17.0.5 slave2
在每一台docker上安装openssh-clients,然后要用ssh登陆。
[root@9f92ba0ee1a4 /]# yum -y install openssh-clients
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 0:03:45 ago on Mon Nov 15 05:54:45 2021.
Dependencies resolved.
===================================================================================================================================================================
Package Architecture Version Repository Size
===================================================================================================================================================================
Installing:
openssh-clients x86_64 8.0p1-6.el8_4.2 baseos 667 k
Installing dependencies:
libedit x86_64 3.1-23.20170329cvs.el8 baseos 102 k
Transaction Summary
===================================================================================================================================================================
Install 2 Packages
Total download size: 769 k
Installed size: 2.7 M
Downloading Packages:
[MIRROR] libedit-3.1-23.20170329cvs.el8.x86_64.rpm: Status code: 403 for http://mirrors.tuna.tsinghua.edu.cn/centos/8.4.2105/BaseOS/x86_64/os/Packages/libedit-3.1-23.20170329cvs.el8.x86_64.rpm (IP: 101.6.15.130)
(1/2): libedit-3.1-23.20170329cvs.el8.x86_64.rpm 253 kB/s | 102 kB 00:00
[MIRROR] openssh-clients-8.0p1-6.el8_4.2.x86_64.rpm: Status code: 403 for http://mirrors.tuna.tsinghua.edu.cn/centos/8.4.2105/BaseOS/x86_64/os/Packages/openssh-clients-8.0p1-6.el8_4.2.x86_64.rpm (IP: 101.6.15.130)
(2/2): openssh-clients-8.0p1-6.el8_4.2.x86_64.rpm 537 kB/s | 667 kB 00:01
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 448 kB/s | 769 kB 00:01
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : libedit-3.1-23.20170329cvs.el8.x86_64 1/2
Installing : openssh-clients-8.0p1-6.el8_4.2.x86_64 2/2
Running scriptlet: openssh-clients-8.0p1-6.el8_4.2.x86_64 2/2
Verifying : libedit-3.1-23.20170329cvs.el8.x86_64 1/2
Verifying : openssh-clients-8.0p1-6.el8_4.2.x86_64 2/2
Installed:
libedit-3.1-23.20170329cvs.el8.x86_64 openssh-clients-8.0p1-6.el8_4.2.x86_64
Complete!
为了将来使用方便,在master机器上安装ansible部署工具,可以做更多的同步工作。
# yum -y install ansible
# yum -y install epel-release
# yum -y install ansible
# cat<<EOF >> /etc/ansible/hosts
[hadoop]
172.17.0.3
172.17.0.4
172.17.0.5
[slave]
172.17.0.4
172.17.0.5
EOF
进入master,然后修改core-site.xml,位置在/usr/local/hadoop_3.2.1/etc/hadoop,增加这一段。
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.2.1/data/tmp</value>
</property>
</configuration>
修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>slave1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property
</configuration>
修改hdfs-site.xml,如下
[root@5e35e92b76f0 hadoop]# cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave2:50090</value>
</property
</configuration>
增加slaves文件,指明slave节点
[root@5e35e92b76f0 hadoop]# cat slaves
master
slave1
slave2
接下来修改文件 maprd-site.xml,这个是map-reduce的配置文件
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
~
修改hdfs,mapreduce的JAVA_HOME环境变量和用户,两个文件分别是hdfs-env.sh, maprd-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el8_4.x86_64/jre
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
拷贝配置文件到各台slave机器
[root@5e35e92b76f0 hadoop]# ansible slave -m copy -a "src=/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml dest=/usr/local/hadoop-3.2.1/etc/hadoop"
172.17.0.4 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
"dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"gid": 1001,
"group": "1001",
"mode": "0644",
"owner": "1001",
"path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"size": 1116,
"state": "file",
"uid": 1001
}
172.17.0.5 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
"dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"gid": 1001,
"group": "1001",
"mode": "0644",
"owner": "1001",
"path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"size": 1116,
"state": "file",
"uid": 1001
}
执行ansible,强行复制文件并覆盖
[root@5e35e92b76f0 hadoop]# ansible slave -m copy -a "src=/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml dest=/usr/local/hadoop-3.2.1/etc/hadoop"
172.17.0.4 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
"dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"gid": 1001,
"group": "1001",
"mode": "0644",
"owner": "1001",
"path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"size": 1116,
"state": "file",
"uid": 1001
}
172.17.0.5 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
"dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"gid": 1001,
"group": "1001",
"mode": "0644",
"owner": "1001",
"path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
"size": 1116,
"state": "file",
"uid": 1001
}
必须初始化hdfs,在master上执行 datanode,namenode format
[root@5e35e92b76f0 hadoop]# hdfs datanode -format
[root@5e35e92b76f0 hadoop]# hdfs namenode -format
2021-11-15 08:24:14,758 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = 5e35e92b76f0/172.17.0.3
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.1
......
399 bytes saved in 0 seconds .
2021-11-15 08:24:15,900 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-11-15 08:24:15,905 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-11-15 08:24:15,905 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 5e35e92b76f0/172.17.0.3
************************************************************/
授予正确的权限
因为包里面解压默认的用户和组是1001,改成root:root
ansible hadoop -m file -a "path=/usr/local/hadoop-3.3.1 state=directory mode=0644 owner=root group=root"
由于启动docker,master的9000端口没有映射到外面,因此需要修改映射端口,实际上可以直接在启动的时候加上9000映射。目前我们通过修改
root@ora-mssql ~]# vi /var/lib/docker/containers/$containerid/config.v2.json
[root@ora-mssql ~]# vi /var/lib/docker/containers/$containerid/hostconfig.json
注意: 在主机上直接访问http://172.17.0.5:50090便可访问了
编写二个启动文件,编辑增加自动增加域名解析的文件
# cat /opt/addhosts
cat <<EOF >>/etc/hosts
172.17.0.2 master
172.17.0.3 slave1
172.17.0.4 slave2
EOF
[root@ora-mssql ~]# cat start-hadoop
docker start dkhmaster
docker start dkhslave1
docker start dkhslave2
sleep 2
docker exec -it -d dkhmaster sh /opt/addhosts
docker exec -it -d dkhslave1 sh /opt/addhosts
docker exec -it -d dkhslave2 sh /opt/addhosts
[root@ora-mssql ~]# cat stop-hadoop
docker stop dkhmaster
docker stop dkhslave1
docker stop dkhslave2
- 本文分类:金融业务
- 本文标签:无
- 浏览次数:4686 次浏览
- 发布日期:2023-06-11 13:07:55
- 本文链接:https://www.growedu.cn/cms/jinrongyewu/123.html
- 上一篇 > 苏州育成ETL工程师培训_做数据仓库迷茫了怎么办?
- 下一篇 > 育成ETL大数据培训_金融+大数据解决方案:银行业