MacBook上从零搭建Hadoop伪分布式集群(含JDK1.8、Hadoop 3.2.2、Spark 3.1.1完整配置流程)

张开发
2026/4/18 1:11:20 15 分钟阅读

分享文章

MacBook上从零搭建Hadoop伪分布式集群(含JDK1.8、Hadoop 3.2.2、Spark 3.1.1完整配置流程)
MacBook上从零搭建Hadoop伪分布式集群实战指南去年刚换M2芯片的MacBook Pro时我花了整整三天时间才搞定Hadoop伪分布式环境。期间踩过的坑包括Homebrew安装路径冲突、ARM架构兼容性问题、环境变量配置错误等。本文将分享一套经过验证的完整流程涵盖JDK 1.8、Hadoop 3.2.2和Spark 3.1.1的配置特别针对macOS Ventura/Sonoma系统优化。1. 环境准备与基础配置1.1 处理ARM架构兼容性问题M1/M2芯片的Mac需要特别注意软件兼容性。虽然大多数工具已支持ARM64架构但仍有几个关键点需要注意优先使用Homebrew安装Homebrew的ARM版本/opt/homebrew能自动处理架构适配问题手动安装时选择ARM版本官网下载页通常标注aarch64或ARM64的包Rosetta兼容模式对尚未适配的软件可通过以下命令创建x86_64终端环境arch -x86_64 zsh1.2 必备工具安装推荐先通过Homebrew安装基础工具链brew install openssl ssh rsync curl配置SSH免密登录Hadoop集群通信必需ssh-keygen -t rsa -P -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys测试是否成功ssh localhost2. JDK 1.8安装与配置2.1 安装选项对比安装方式优点缺点推荐场景Homebrew自动配置环境变量版本可能较新快速开发环境官网下载版本可控需手动配置生产环境兼容Azul ZuluARM原生支持需要注册账户M1/M2芯片优化推荐使用Azul Zulu的ARM版本brew tap homebrew/cask-versions brew install --cask zulu82.2 环境变量配置修改~/.zshrcmacOS默认shellexport JAVA_HOME$(/usr/libexec/java_home -v 1.8) export PATH$JAVA_HOME/bin:$PATH验证安装java -version # 应输出类似openjdk version 1.8.0_3823. Hadoop 3.2.2伪分布式部署3.1 安装方式选择对于M1/M2设备建议手动下载二进制包以避免架构问题cd ~/Downloads curl -O https://archive.apache.org/dist/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz tar -xzf hadoop-3.2.2.tar.gz sudo mv hadoop-3.2.2 /usr/local/hadoop3.2 核心配置文件修改所有配置文件位于/usr/local/hadoop/etc/hadoop/目录core-site.xmlconfiguration property namefs.defaultFS/name valuehdfs://localhost:9000/value /property property namehadoop.tmp.dir/name value/usr/local/hadoop/tmp/value /property /configurationhdfs-site.xmlconfiguration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile://${hadoop.tmp.dir}/dfs/name/value /property property namedfs.datanode.data.dir/name valuefile://${hadoop.tmp.dir}/dfs/data/value /property /configuration3.3 环境变量配置追加到~/.zshrcexport HADOOP_HOME/usr/local/hadoop export PATH$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_OPTS-Djava.library.path$HADOOP_HOME/lib/native3.4 初始化与启动首次运行需要格式化NameNodehdfs namenode -format启动集群start-dfs.sh start-yarn.sh验证服务jps # 应看到至少包含NameNode, DataNode, ResourceManager, NodeManager4. Spark 3.1.1集成部署4.1 安装注意事项Spark需要与Hadoop版本匹配。对于Hadoop 3.2.2应选择Pre-built for Apache Hadoop 3.2 and later的包curl -O https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz tar -xzf spark-3.1.1-bin-hadoop3.2.tgz sudo mv spark-3.1.1-bin-hadoop3.2 /usr/local/spark4.2 关键配置编辑/usr/local/spark/conf/spark-env.shcp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh echo export SPARK_DIST_CLASSPATH$(hadoop classpath) /usr/local/spark/conf/spark-env.sh环境变量配置export SPARK_HOME/usr/local/spark export PATH$PATH:$SPARK_HOME/bin4.3 启动与验证启动Spark集群$SPARK_HOME/sbin/start-master.sh $SPARK_HOME/sbin/start-worker.sh spark://$(hostname):7077提交测试任务spark-submit --class org.apache.spark.examples.SparkPi \ --master spark://$(hostname):7077 \ $SPARK_HOME/examples/jars/spark-examples_2.12-3.1.1.jar 105. 常见问题排查5.1 端口冲突问题Hadoop和Spark默认使用以下端口确保它们未被占用服务端口检查命令HDFS NameNode9000lsof -i :9000YARN8088netstat -anSpark Master7077ss -tulnp5.2 内存配置调整对于8GB内存的MacBook建议修改以下配置$HADOOP_HOME/etc/hadoop/yarn-site.xmlproperty nameyarn.nodemanager.resource.memory-mb/name value4096/value /property$SPARK_HOME/conf/spark-defaults.confspark.executor.memory 2g spark.driver.memory 1g5.3 文件权限问题Hadoop对文件权限敏感遇到权限错误时可尝试hdfs dfs -chmod -R 777 /tmp6. 开发环境优化技巧6.1 使用Docker简化部署对于频繁重置环境的需求可考虑Docker方案docker run -it --name hadoop -p 9000:9000 -p 8088:8088 sequenceiq/hadoop-docker:2.7.16.2 IDE集成配置在IntelliJ IDEA中配置添加依赖dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version3.2.2/version /dependency设置VM参数-Djava.library.path/usr/local/hadoop/lib/native6.3 性能监控工具安装Ganglia进行集群监控brew install ganglia配置/usr/local/hadoop/etc/hadoop/hadoop-metrics2.properties*.sink.ganglia.classorg.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period10 namenode.sink.ganglia.serverslocalhost:8649

更多文章