配置准备:
1、centos6.4系统的虚拟机4个(master、secondary、node1、node2)
2、准备安装包
hadoop-cdh4.4.0、hive-cdh4.4.0、presto、discovery-server、hbase、JDK7.0+64bit、pythin2.4+、postgresql
注:Ssh 权限配置问题:
用户目录权限为 755 或者 700就是不能是77x .ssh目录权限必须为755 rsa_id.pub 及authorized_keys权限必须为644 rsa_id权限必须为600 最后,在master中测试:ssh master date、ssh secondary date、ssh node1 date、ssh node2 date 不需要密码,则成功。 如果ssh secondary 、ssh node1、ssh node2 连接速度慢,需要更改/etc/ssh/ssh_config 为GSSAPIAuthentication no修改root的ssh权限,/etc/ssh/sshd_config,将PermitRootLogin no 改为yes
重启sshd服务:/etc/init.d/sshd restrat5、配置环境变量
[root@master~]# gedit .bash_profile
# .bash_profile# Get the aliases and functions
if [ -f ~/.bashrc ]; then . ~/.bashrcfi# User specific environment and startup programs
export JAVA_HOME=/usr/java/jdk1.7.0_45export JRE_HOME=$JAVA_HOME/jreexport CLASS_PATH=./:$JAVA_HOME/lib:$JRE_HOME/lib:$JRE_HOME/lib/tools.jar:/usr/presto/server/lib:/usr/discovery-server/libexport HADOOP_HOME=/usr/hadoop
export HIVE_HOME=/usr/hiveexport HBASE_HOME=/usr/hbaseexport HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}export YARN_HOME=${HADOOP_HOME}export HADOOP_YARN_HOME=${HADOOP_HOME}export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin
master环境变量配置好后,secondary、node1和node2同样配置,可以使用scp命令同步到secondary、node1和node2中6、配置hadoop
fs.defaultFS hdfs://master:8020 fs.trash.interval 10080 fs.trash.checkpoint.interval 10080
c、hdfs-site.xml
dfs.replication 3 hadoop.tmp.dir /opt/data/hadoop-${user.name} dfs.namenode.http-address master:50070 dfs.namenode.secondary.http-address secondary:50090 dfs.webhdfs.enabled true
d、masters(没有则创建该文件)
mapreduce.framework.name yarn mapreduce.jobhistory.address master:10020 mapreduce.jobhistory.webapp.address master:19888
g、yarn-site.xml
yarn.resourcemanager.resource-tracker.address master:8031 yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088 Classpath for typical applications. yarn.application.classpath $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $YARN_HOME/share/hadoop/yarn/*,$YARN_HOME/share/hadoop/yarn/lib/*, $YARN_HOME/share/hadoop/mapreduce/*,$YARN_HOME/share/hadoop/mapreduce/lib/* yarn.nodemanager.aux-services mapreduce.shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.nodemanager.local-dirs /opt/data/yarn/local yarn.nodemanager.log-dirs /opt/data/yarn/logs Where to aggregate logs yarn.nodemanager.remote-app-log-dir /opt/data/yarn/logs yarn.app.mapreduce.am.staging-dir /user
h、复制hadoop到secondary、node1和node2
i、hadoop第一次运行需要先格式化,命令如下:[root@tamaster hadoop]hadoop namenode -format
j、关闭hadoop安全模式,命令如下:hdfs dfsadmin -safemode leave
k、运行hadoop,命令: [root@tamaster:~]start-all.sh
master
secondarynode1hbase.rootdir hdfs://master/hbase-${user.name} hbase.cluster.distributed true hbase.tmp.dir /opt/data/hbase-${user.name} hbase.zookeeper.quorum master,secondary,node1,node2
d、将hbase同步到secondary、node1、node2中
e、启动hbase,命令如下:
[root@master:~]# start-hbase.sh
8、安装hive
a、下载hive压缩包,并将其解压到/usr,即:/usr/hive
b、hive-site.xml
javax.jdo.option.ConnectionURL jdbc:postgresql://master/testdb JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName org.postgresql.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName hiveuser username to use against metastore database javax.jdo.option.ConnectionPassword redhat password to use against metastore database mapred.job.tracker master:8031 mapreduce.framework.name yarn hive.aux.jars.path file:///usr/hive/lib/zookeeper-3.4.5-cdh4.4.0.jar, file:///usr/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar, file:///usr/hive/lib/hbase-0.94.2-cdh4.4.0.jar, file:///usr/hive/lib/guava-11.0.2.jar hive.metastore.warehouse.dir /opt/data/warehouse-${user.name} location of default database for the warehouse hive.exec.scratchdir /opt/data/hive-${user.name} Scratch space for Hive jobs hive.querylog.location /opt/data/querylog-${user.name} Location of Hive run time structured log file hive.support.concurrency Enable Hive's Table Lock Manager Service true hive.zookeeper.quorum Zookeeper quorum used by Hive's Table Lock Manager node1 hive.hwi.listen.host desktop1 This is the host address the Hive Web Interface will listen on hive.hwi.listen.port 9999 This is the port the Hive Web Interface will listen on hive.hwi.war.file lib/hive-hwi-0.10.0-cdh4.2.0.war This is the WAR file with the jsp content for Hive Web Interface
9、安装postgresql(用postgresql作为元数据库)
a、下载postgresql,并安装
b、使用pgadmin创建用户sac、使用pgadmin创建数据库testdb,并指定所属角色为sad、配置pg_hba.conf的访问地址,允许主机访问e、配置postgresql.conf standard_conforming_strings = offf、复制postgres 的jdbc驱动 到 /usr/hive-cdh4.4.0/lib1)node.properties
node.environment=productionnode.id=F25B16CB-5D5B-50FD-A30D-B2221D71C882node.data-dir=/var/presto/data 注意每台服务器node.id必须是唯一的2)jvm.config-server -Xmx16G-XX:+UseConcMarkSweepGC-XX:+ExplicitGCInvokesConcurrent-XX:+CMSClassUnloadingEnabled-XX:+AggressiveOpts-XX:+HeapDumpOnOutOfMemoryError-XX:OnOutOfMemoryError=kill -9 %p-XX:PermSize=150M-XX:MaxPermSize=150M-XX:ReservedCodeCacheSize=150M-Xbootclasspath/p:/var/presto/installation/lib/floatingdecimal-0.1.jar下载floatingdecimal-0.1.jar包放在/var/presto/installation/lib/目录下3)config.propertiescoordinator=truedatasources=jmxhttp-server.http.port=8080presto-metastore.db.type=h2presto-metastore.db.filename=var/db/MetaStoretask.max-memory=1GBdiscovery-server.enabled=truediscovery.uri=http://master:8411以上为master的配置,secondary、node1和node2中需将coordinator=true值改为false,将discovery-server.enabled=true删除掉4)log.properties com.facebook.presto=DEBUG5)在/usr/presto/etc中创建catalog文件夹,并创建以下配置文件jmx.properties connector.name=jmxhive.propertes connector.name=hive-cdh4 hive.metastore.uri=thrift://master:9083
1)node.properties
node.environment=production node.id=D28C24CF-78A1-CD09-C693-7BDE66A51EFDnode.data-dir=/var/discovery/data2)jvm.config-server -Xmx1G-XX:+UseConcMarkSweepGC-XX:+ExplicitGCInvokesConcurrent-XX:+AggressiveOpts-XX:+HeapDumpOnOutOfMemoryError-XX:OnOutOfMemoryError=kill -9 %p3)config.properties http-server.http.port=8411运行:
master机器上运行命令如下:
start-all.sh(启动每台机器上的hadoop)
start-hbase.sh(启动每台机器上的hbase)
转入usr/disdiscovery-server/bin中启动disdiscovery-server,命令如下
1、启动hadoop命令:
hadoop namenode -formathadoop datanode -formatstart-all.shhadoop dfsadmin -safemode leavehdfs dfsadmin -safemode leave2、hive启动命令:./hive./hive --service hiveserver -p 9083 //thrift模式3、hbase 命令./start-hbase.sh4、discovery-server命令:laucher start //启动laucher run //运行lancher stop //停止5、presto命令laucher start //启动laucher run //运行lancher stop //停止6、presto 客户端启动./presto --server localhost:8080 --catalog hive --schema default4 nodes select Count(*) from mytable; 10s
4 nodes select Count(*),num from mytable group by num; 10s
4 nodes select num from mytable group by num having count(*)>1000; 10s
4 nodes select min(num) from mytable group by num; 9s
4 nodes select min(num) from mytable; 9s
4 nodes select max(num) from mytable; 9s
4 nodes select min(num) from mytable group by num; 9s
4 nodes select row_number() over(partition by name order by num) as row_index from mytable; 16s