TOP Feature

What is the k2hdkc?

K2HDKC (K2Hash based Distributed Kvs Cluster) is a free and open-source distributed KVS(Key Value Store) clustering system created in Yahoo! JAPAN.

Background

After years of using various distributed KVS products in Yahoo! JAPAN, we need to improve performance and enhance availability, scalability and ease of use. Therefore, we have developed the K2HDKC and succeeded in reducing the operational cost while maintaining the performance of K2HASH and the scalability and availability of CHMPX. We have already open-sourced K2HASH and CHMPX in 2016 which are the heart of K2HDKC.

K2HASH

The key-value store library for handling lots of key, large data size, high performance and many original function.

CHMPX

The data exchange through networks with each node by POSIX Message Queue(MQ) with consistent hashing system.

Overview

More than one server node becomes one K2HDKC cluster. Server nodes in the cluster communicate with the client application using a driver library program that implements the K2HDKC API.

Figure.1 Figure 1 Overview

Components for server nodes

A server node is based on the following three components.

CHMPX

CHMPX processes accept a series of communication commands from client driver library to the K2HDKC.

K2HDKC

K2HDKC processes receive client requests via CHMPX and handle them and manage K2HASH database files.

K2HASH

K2HASH database files store data.

Components for client application

A client application is based on the following 2 components.

CHMPX

A CHMPX process provides durable network communications in case of cluster node failure.

K2HDKC driver library

A client process using a driver library program that implements the K2HDKC API sends communication commands to a K2HDKC server process.

Availability

Every node in the cluster has the same role and data is evenly distributed among the nodes in the K2HDKC cluster. By default,

The value of the CHMPX DELIVERMODE configuration parameter is hash.

The CHMPX REPLICA setting is 1, and the data is stored on the main server node and a replica node.

Even if CHMPX detects an unreachable node in the cluster, CHMPX knows that it can still reach another node with the data of the unreachable node.
Therefore, K2HDKC maintains high availability.

Consistency

In order to ensure data consistency at the time of recovery of the main server node from transient node failures, it automatically restores only data that it doesn’t have from a replica node. We call this function as Automatic Merge.
Note: You can specify more than 1 REPLICA setting. If 1, the replica node is one, but if 2, the replica node is two units.
When increasing or decreasing server nodes of the cluster, the data held by each server node is automatically relocated.
This function is called Automatic Scaling function. The Automatic Scaling function is implemented using the same function as Automatic Merge.

Storage

K2HASH is a storage implementation of the K2HDKC clustering system.
K2HASH is greatly related to the functions of K2HDKC.

K2HASH has three types of storage.

on memory

mmap an entire file

mmap just part of a file

TOP Feature