SPLAD: scatter and place data

During our work around RelaxDHT, we noticed that a significant part of the gains came from a better parallelization of data transfers. We have examined the impact of data layout on nodes storage for long-term sustainability.
We proposed SPLAD, which offers the possibility to set how data is scattered and placed on a set of storage nodes. the way in which the copies of the data are arranged has a very strong impact: the more the repairs are quicker, the more data loss will be limited. Each node hosts many copies of different blocks of data, we control the parallelization data transfers by more or less scattering the node data. We also study precisely the way the data is scattered and the distribution storage load on the nodes. this study goes beyond the context of the peer-to-peer domain. We just assume that the nodes can communicate with each other. Our approach is independent of the way in which the nodes and data blocks are indexed and localized.

How SPLAD works:


The system is composed of n nodes storing m blocks of data replicated k times. So there are m k copies of blocks of data distributed over the n nodes. Obviously, two copies of the same data are never placed on the same node. we consider that the nodes have identifiers unique and form a logical ring we do not take into account the arrival of new blocks of data: all the data is added to the system at the beginning, then we observe its behavior in the presence of faults.
that the nodes are homogeneous; They have the same characteristics network, both in terms of latency and bandwidth. we study the cases of networks with symmetric and asymmetric bandwidths. The faults of the knots follow a Poisson distribution, and each node at the even conversion breaks down. faults are crashes: a failed node never comes back, the copies of data it was storing are lost. Each node is immediately replaced by a new, empty node having a new identifier. the number of nodes within the system is therefore constant. Finally, the way to scatter the data, i.e. the choice storage nodes within the selection ranges is also important. For small selection ranges, the least loaded strategy behaves well: it offers

good results both in terms of sustainability and distribution dump. For larger selection ranges, the strategy power of choice is a good choice. It allows rapid repairs by distributing the new copies to many destinations while maintaining a distribution of the reasonable storage load, without requiring significant additional maintenance costs. As already shown random placement of copies of data blocks
does not usually give good results.

Leave a Comment

Your email address will not be published. Required fields are marked *