依赖工具安装
ssh
sudo apt-get install ssh
java
sudo apt install default-jdk
java --version
下载hadoop
https://dlcdn.apache.org/hadoop/common/
下载后并解压进入目录, 例如
cd hadoop-3.3.4
配置java
nano etc/hadoop/hadoop-env.sh
# set to the root of your Java installation
export JAVA_HOME=/usr/lib/jvm/default-java
测试hadoop环境和类库是否ok
bin/hadoop
启动dfs
配置参数
vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vi etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
ssh 无密码登陆
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
chmod 0600 ~/.ssh/authorized_keys2
sudo service ssh restart
如果在进行上诉操作后还提示 user@localhost: Permission denied (publickey).
chmod 750 $HOME
启动
bin/hdfs namenode -format
bin/hdfs datanode -format
sbin/start-dfs.sh
验证
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?[user.name=<USER>&]op=..."
curl -i 'http://10.60.2.114:9870/webhdfs/v1?op=LISTSTATUS'
远程访问
sudo nano /etc/hosts
10.60.2.114 server-precision-3630-tower
例如上传文件到: http://10.60.2.114:9870/user/data/model.json
from hdfs import InsecureClient
client = InsecureClient('http://10.60.2.114:9870', user='data')
files = client.list('/')
print(files)
with client.write('model.json', encoding='utf-8') as writer:
writer.write('111')
运行ResourceManager/NodeManager
配置参数
vi etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
vi etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
启动
sbin/start-yarn.sh
验证
jps
jps
1685161 NodeManager
1684825 ResourceManager
1686011 SecondaryNameNode
1685795 DataNode
1685618 NameNode
1686165 Jps
不同命令的区别
start-all.sh & stop-all.sh
Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed.
start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh
Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh
hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager
To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands.
Use case
Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine,
bin/hadoop-daemon.sh start datanode
Note : You should have ssh enabled if you want to start all the daemons on all the nodes from one machine.