调试Spark支持S3作为DataSource的时候,需要改下hadoop-aws这个包,所以编译了一把Hadoop,具体步骤可以参考这个韩国人的文章,包括centos和ubuntu两个操作系统的说明。注意Hadoop编译需要cmake/make/protoc/libssl等。我的操作系统是ubuntu,直接照抄下面即可(相对韩国人我删掉了 -Drequire.snappy参数)。

# sudo apt-get install maven libssl-dev build-essential pkgconf cmake libprotobuf8 protobuf-compiler
# tar xvf hadoop-2.7.2-src.tar.gz
# cd hadoop-2.7.2-src
# mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true -Drequire.openssl

编译出来的hadoop-2.7.2.tar.gz包在{hadoopBaseDir}/hadoop-dist/target,解压配置继续。如果是升级,可以只拷贝升级的包到工作目录。

不过在编译hadoop的时候遇到了一个比较特别的问题,记录下。

Maven的出错信息如下:

org.apache.hadoop.util.NativeCrc32 org.apache.hadoop.net.unix.DomainSocket org.apache.hadoop.net.unix.DomainSocketWatcher
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-common ---
[INFO] Executing tasks

main:
     [exec] -- The C compiler identification is GNU 4.8.4
     [exec] -- The CXX compiler identification is GNU 4.8.4
     [exec] -- Check for working C compiler: /usr/bin/cc
     [exec] -- Check for working C compiler: /usr/bin/cc -- works
     [exec] -- Detecting C compiler ABI info
     [exec] -- Detecting C compiler ABI info - done
     [exec] -- Check for working CXX compiler: /usr/bin/c++
     [exec] -- Check for working CXX compiler: /usr/bin/c++ -- works
     [exec] -- Detecting CXX compiler ABI info
     [exec] JAVA_HOME=, JAVA_JVM_LIBRARY=JAVA_JVM_LIBRARY-NOTFOUND
     [exec] JAVA_INCLUDE_PATH=JAVA_INCLUDE_PATH-NOTFOUND, JAVA_INCLUDE_PATH2=JAVA_INCLUDE_PATH2-NOTFOUND
     [exec] CMake Error at JNIFlags.cmake:120 (MESSAGE):
     [exec]   Failed to find a viable JVM installation under JAVA_HOME.
     [exec] Call Stack (most recent call first):
     [exec]   CMakeLists.txt:24 (include)
     [exec] 
     [exec] 
     [exec] -- Detecting CXX compiler ABI info - done
     [exec] -- Configuring incomplete, errors occurred!
     [exec] See also "/home/ieevee/hadoop/hadoop-2.7.2-src/hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeOutput.log".
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:

Google了半天,基本都是非常肯定的说没有安装libssl-dev、zlib1g-dev之类缺包的问题,但很显然我是装了的。后来找到一个遇到相同问题的文章,提到了这个问题的原因是设置JAVA_HOME不符合Hadoop的需求。

由于我的操作系统上装了三套JDK(2套1.7一套1.8),所以JAVA_HOME设置的值是/usr,这种情况下Hadoop/Spark/Hbase/Hive都跑的好好的。但Hadoop编译需要设置到具体的JDK,在我这里就是export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64。那么是什么原因导致的呢?

找到上面报错的{hadoopBaseDir}/hadoop-common-project/hadoop-common/src/JNIFlags.cmake这个文件,看错误是JAVA_JVM_LIBRARY变量为空。走读下这个cmake文件,JAVA_JVM_LIBRARY/JAVA_INCLUDE_PATH2/JAVA_INCLUDE_PATH这几个变量都是从变量_JDK_DIRS取的值,而这个值是从JAVA_HOME系统变量中直接添加后缀/jre/lib什么的,而这个目录在/usr里不存在,只存在于具体的JDK路径中。

    FILE(TO_CMAKE_PATH "$ENV{JAVA_HOME}" _JAVA_HOME)
..
    SET(_JDK_DIRS "${_JAVA_HOME}/jre/lib/${_java_libarch}/*"
                  "${_JAVA_HOME}/jre/lib/${_java_libarch}"
                  "${_JAVA_HOME}/jre/lib/*"
                  "${_JAVA_HOME}/jre/lib"
                  "${_JAVA_HOME}/lib/*"
                  "${_JAVA_HOME}/lib"
                  "${_JAVA_HOME}/include/*"
                  "${_JAVA_HOME}/include"
                  "${_JAVA_HOME}"
    )
    FIND_PATH(JAVA_INCLUDE_PATH
        NAMES jni.h
        PATHS ${_JDK_DIRS}
        NO_DEFAULT_PATH)
    #In IBM java, it's jniport.h instead of jni_md.h
    FIND_PATH(JAVA_INCLUDE_PATH2
        NAMES jni_md.h jniport.h
        PATHS ${_JDK_DIRS}
        NO_DEFAULT_PATH)
    SET(JNI_INCLUDE_DIRS ${JAVA_INCLUDE_PATH} ${JAVA_INCLUDE_PATH2})
    FIND_LIBRARY(JAVA_JVM_LIBRARY
        NAMES jvm JavaVM
        PATHS ${_JDK_DIRS}
        NO_DEFAULT_PATH)
    SET(JNI_LIBRARIES ${JAVA_JVM_LIBRARY})
    MESSAGE("JAVA_HOME=${JAVA_HOME}, JAVA_JVM_LIBRARY=${JAVA_JVM_LIBRARY}")
    MESSAGE("JAVA_INCLUDE_PATH=${JAVA_INCLUDE_PATH}, JAVA_INCLUDE_PATH2=${JAVA_INCLUDE_PATH2}")
    IF(JAVA_JVM_LIBRARY AND JAVA_INCLUDE_PATH AND JAVA_INCLUDE_PATH2)
        MESSAGE("Located all JNI components successfully.")
    ELSE()
        MESSAGE(FATAL_ERROR "Failed to find a viable JVM installation under JAVA_HOME.")
    ENDIF()

据说HBASE可以编译成功。

另1,贴下ubuntu一套系统下切换JDK的命令。

sudo update-alternatives --config javac

根据提示选择1/2/3即可。

另2,默认ubuntu 14.04 server不能安装OpenJDK 8,如何安装可以看这篇文章的说明。