Hadoop cluster bottleneck detection algorithm based on resource information gain

基于资源信息增益的Hadoop集群瓶颈检测算法

Abstract

The invention discloses a Hadoop cluster bottleneck detection algorithm based on resource information gain. The Hadoop cluster bottleneck detection algorithm includes three steps including monitoring response satisfaction (RS) of each node in a cluster and determining bottleneck occurs to a node when the RS is reduced to a certain threshold value, sampling the node with bottleneck problem and conducting discretization for the samples, and calculating the information gains of all resources in the samples according to the samples, and taking the resources with greater information gain as the bottleneck resources. Through the method, the operation conditions of all parts can be clearly known, the resource utilization rate can be optimized, and the expandability of a Hadoop system can be improved.
本发明公布了基于资源信息增益的Hadoop集群瓶颈检测算法。该算法包括三部分:监测集群中每个节点的响应满意度RS,当RS下降到一定阀值时,则判断该节点出现瓶颈;针对出现瓶颈的节点采集样本,并对样本进行离散化处理;根据样本,计算出各资源的在样本中的信息增益,将信息增益较大的作为瓶颈资源。本发明方法可清楚地了解各个组件运行情况,优化资源利用率,提高Hadoop系统的可扩展性。

Claims

Description

Topics

Download Full PDF Version (Non-Commercial Use)

Patent Citations (0)

    Publication numberPublication dateAssigneeTitle

NO-Patent Citations (0)

    Title

Cited By (0)

    Publication numberPublication dateAssigneeTitle