Hadoop集群问题集
1、bigdata is not allowed to impersonate xxx
原因:用户代理未生效。检查core-site.xml文件是否正确配置。
我们提供的服务有:网站建设、网站设计、微信公众号开发、网站优化、网站认证、永丰ssl等。为上千多家企事业单位解决了网站和推广的问题。提供周到的售前咨询和贴心的售后服务,是有科学管理、有技术的永丰网站制作公司
hadoop.proxyuser.bigdata.hosts
*
hadoop.proxyuser.bigdata.groups
*
备注hadoop.proxyuser.XXX.hosts 与 hadoop.proxyuser.XXX.groups 中XXX为异常信息中User:* 中的用户名部分
hadoop.proxyuser.bigdata.hosts
*
The superuser can connect only from host1 and host2 to impersonate a user
hadoop.proxyuser.bigdata.groups
*
Allow the superuser oozie to impersonate any members of the group group1 and group2
增加以上配置后,无需重启集群,可以直接在namenode节点上使用管理员账号重新加载这两个属性值,命令为:
$ hdfs dfsadmin -refreshSuperUserGroupsConfiguration
Refresh super user groups configuration successful
$ yarn rmadmin -refreshSuperUserGroupsConfiguration
19/01/16 15:02:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8033
如果集群配置了HA,执行如下命令namenode节点全部重新加载:
# hadoop dfsadmin -fs hdfs://ns -refreshSuperUserGroupsConfiguration
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Refresh super user groups configuration successful for master/192.168.99.219:9000
Refresh super user groups configuration successful for node01/192.168.99.173:9000
2、org.apache.hadoop.hbase.exceptions.ConnectionClosingException
现象:使用beeline、jdbc、python调用hiveserver2时,无法查询、建表等Hbase关联表,
hive.server2.enable.doAs
false
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
在hive创建Hbase关联表
# Hive中的表名test_tb
CREATE TABLE test_tb(key int, value string)
# 指定存储处理器
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
# 声明列族,列名
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
# hbase.table.name声明HBase表名,为可选属性默认与Hive的表名相同
# hbase.mapred.output.outputtable指定插入数据时写入的表,如果以后需要往该表插入数据就需要指定该值
TBLPROPERTIES ("hbase.table.name" = "test_tb", "hbase.mapred.output.outputtable" = "test_tb");
Spark work目录定时清理
使用spark standalone模式执行任务,没提交一次任务,在每个节点work目录下都会生成一个文件夹,命名规则app-xxxxxxx-xxxx。该文件夹下是任务提交时,各节点从主节点下载的程序所需要的资源文件。 这些目录每次执行都会生成,且不会自动清理,执行任务过多会将内存撑爆。
- 每一个application的目录中都是该spark任务运行所需要的依赖包:
export SPARK_WORKER_OPTS=" -Dspark.worker.cleanup.enabled=true # 是否开启自动清理 -Dspark.worker.cleanup.interval=1800 # 清理周期,每隔多长时间清理一次,单位秒 -Dspark.worker.cleanup.appDataTtl=3600" # 保留最近多长时间的数据
zookeeper连接数过多导致hbase、hive无法连接
2019-01-25 03:26:41,627 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /172.17.0.1 - max is 60
根据线上环境修改hbase、hive连接Zookeeper配置
hbase-site.xml
hbase.zookeeper.property.maxClientCnxns
hive-site.xml
hive.server2.thrift.min.worker.threads
hive.server2.thrift.max.worker.threads
hive.zookeeper.session.timeout
zoo.cfg
# Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address
maxClientCnxns=200
# The minimum session timeout in milliseconds that the server will allow the client to negotiate
minSessionTimeout=1000
# The maximum session timeout in milliseconds that the server will allow the client to negotiate
maxSessionTimeout=60000
持续更新....
新闻标题:Hadoop集群问题集
浏览路径:http://hbruida.cn/article/ppodcd.html