cestoliv, il y a 2 ans - ven. 15 juil. 2022

Using IPFS for data replication

1. What is IPFS?

IPFS stands for InterPlanetary File System. As its documentation says:

IPFS is a distributed system for storing and accessing files, websites, applications and data.

ipfs.io

Decentralized

In fact, it is a system that allows to store files in peer-to-peer, like torrents. Instead of requesting a file from a central server, you request it from all the peers who also own it. There is no central server, so the content is always accessible (as long as at least one peer owns it).

Content addressing

In most of the content we know, the content is addressed by a path: the files on your computer /home/cestoliv/Documents/file.txt or on the web cestoliv.com/file.txt.

With IPFS, content is not addressed by its path, but by its content. So the address of a file on IPFS is the hash of its content, for example: /ipfs/QmXoypizjW3WknFiJnKowHCnL72vedxjQkDDP1lXWo6uco

2. Create an IPFS private network with IPFS cluster

IPFS private networks

If IPFS allows, like torrents, everyone to access content, it is also possible to create a private network, where only the peers you allow can access certain content. This is what we will do.

IPFS Cluster

Furthermore, we want to replicate data, to do this we want several peers to download it. When we access a file on IPFS, it remains available for several days on our machine and is then deleted to free up space. In the context of our replication, this is not what we want, we want all our peers to download all our files. IPFS answers this problem thanks to the PIN concept. If you pin a file, it will be downloaded, and never deleted.

Then we will use the IPFS-Cluster tool, which allows to control all peers to tell them to pin a file. So, we will just have to pin a file in our cluster and all the peers will pin it automatically.

3. Create the main node

Go to your main node (the one that will first receive the data to duplicate) and create a working folder.

mkdir -p ipfs-cluster && cd ipfs-cluster

Then create a docker-compose.yml file with the following content:

Before launching the containers, we need to generate the swarm key to create a private IPFS network and a CLUSTER_SECRET to authenticate the other nodes in our private network.

Generate the swarm key

mkdir -p ./data/ipfs/data
echo -e "/key/swarm/psk/1.0.0/\n/base16/\n`tr -dc 'a-f0-9' < /dev/urandom | head -c64`" > ./data/ipfs/data/swarm.key

Create the CLUSTER_SECRET

echo -e "CLUSTER_SECRET=`od -vN 32 -An -tx1 /dev/urandom | tr -d ' \n'`" >> .env

Finally! You can start the containers!

docker compose up -d

Check that everything went well in the logs: docker compose logs (You may see false error because the cluster is faster to start than the IPFS node, so the cluster will not be able to connect to the node at first)

Because this is a private network, we will remove all connections with public nodes:

docker compose exec ipfs ipfs bootstrap rm --all

(Optional) Install the ipfs-cluster-ctl on the host

ipfs-cluster-ctl is the tool that allows us to interact with our cluster (in particular to say which files should be pinned). This is done through port 9094 that we have opened on the container.

wget https://dist.ipfs.io/ipfs-cluster-ctl/v1.0.2/ipfs-cluster-ctl_v1.0.2_linux-amd64.tar.gz
tar -xvf ipfs-cluster-ctl_v1.0.2_linux-amd64.tar.gz
sudo cp ipfs-cluster-ctl/ipfs-cluster-ctl /usr/local/bin
# Check that ipfs-cluster-ctl has been successfully installed
ipfs-cluster-ctl --version

It is not mandatory to install the tool on the host, because it is already installed in the container. If you don't want to install it, just replace the ipfs-cluster-ctl command with docker compose exec cluster ipfs-cluster-ctl.

Test that the main node is working

To test that our network is private, we will add a file to the cluster and check that we cannot open this file from a public node.

To do this, we need to add a file that will be unique, if we just add a file containing "Hello world!" and someone already did it on the public node, the hash will be the same, so our file will be indirectly accessible via the public nodes.

echo "Hello $USER (`od -vN 6 -An -tx1 /dev/urandom | tr -d ' \n'`)!" > hello.txt
ipfs-cluster-ctl add hello.txt
# output: added <your file hash> hello.txt

Let's test that our IPFS network is private. To do this, try to see the contents of the file from the container (it should work):

docker compose exec ipfs ipfs cat <your file hash>
# output: Hello <you> (<a random number>)!

But if you try to view the contents of the file from a node that is not part of your private network, it should not work:

# Try opening `https://ipfs.io/ipfs/<your file hash>` in your browser.

4. Adding replication node(s)

This part must be repeated for each replication node you want to add.

Go to your replication node and create a working folder.

mkdir -p ipfs-cluster && cd ipfs-cluster

Create a docker-compose.yml file with the following content:

Before launching the containers, we need to create the .env file and put in it the same CLUSTER_SECRET as the .env of our main node.

CLUSTER_SECRET=<your cluster secret>
MAIN_NODE=/ip4/<main-node-ip>/tcp/9096/ipfs/<main-node-id>
# e.g. MAIN_NODE=/ip4/192.168.1.1/tcp/9096/ipfs/13D3KooWFN75ytQMC94a6gVLDQ999zxADpFpi7qAir9ajGrNHn8d

The ID of the main node can be found by running the following command on the main node:

cat ./data/cluster/identity.json

We also need to copy the swarm.key we generated on the main node to our replication node (./data/ipfs/data/swarm.key on the main node)

# On the Main node
# Copy the result off:
cat ./data/ipfs/data/swarm.key
# On the Replication node
# Paste the swarm key
mkdir -p ./data/ipfs/data
echo "/key/swarm/psk/1.0.0/
/base16/
<your swarm key>" > ./data/ipfs/data/swarm.key

You can now start the containers!

docker compose up -d

Because this is a private network, we will remove all connections with public nodes:

docker compose exec ipfs ipfs bootstrap rm --all

Connect to main node

We now need to add our nodes as peers so that they can communicate. The main node must know the replication node(s) and the replication node(s) must know the main node.

On the main node:

docker compose exec ipfs ipfs bootstrap add /ip4/<replication_node_ip>/tcp/4001/p2p/<replication_node_id>
# e.g. /ip4/192.168.1.2/tcp/4001/p2p/12D3KopWK6rfR6SKpmxDwKCjtnJWoK1VRYc7BDMZfJxnopljv68u

The ID of the replication node can be found by running the following command on the replication node:

docker compose exec ipfs ipfs config show | grep "PeerID"

On the replication node:

docker compose exec ipfs ipfs bootstrap add /ip4/<main_node_ip>/tcp/4001/p2p/<main_node_id>
# e.g. /ip4/192.168.1.1/tcp/4001/p2p/92D3KopWK6rfR6SKpmxDwKCjtnJWoK1VRYc7BDMZfJxnopljv69l

The ID of the main node can be found by running the following command on the main node:

docker compose exec ipfs ipfs config show | grep "PeerID"

Testing our installation

echo "Hello $USER (`od -vN 6 -An -tx1 /dev/urandom | tr -d ' \n'`)!" > hello.txt
ipfs-cluster-ctl add hello.txt
# output: added <your file hash> hello.txt

You can now see the replication in action with the command ipfs-cluster-ctl status <your file hash>.

the return off the command with a cluster of three nodes

The mention PINNED means that the peer has pinned the file, so it has downloaded it and will keep it.

5. Going further

Now I'll show you a more specific use case for our cluster: replicate a folder across all our nodes.

By adding the -r option, you can synchronize a folder and all its contents.

ipfs-cluster-ctl add -r folder/

As you can see, IPFS stores both the hash of the folder itself, and the hash of each file.

the return off the command that pin recursively on a cluster

But what happens if the content of the folder changes, how do you replicate these changes?

To replicate changes to a folder that has changed, you must add it again. At this point, the hash of the folder will change and the hash of the modified files will have changed. If a file has not changed, its hash will not change.

Here is the procedure to follow:

We de-pin the current version
Re-pin the new version
Delete the files that no longer exist or their old version

Here is the bashed version:

#!/bin/bash

# unpin everything
if [[ -n $(ipfs-cluster-ctl pin ls) ]]; then
    ipfs-cluster-ctl pin ls | cut -d' ' -f1 | xargs -n1 ipfs-cluster-ctl pin rm
fi

# pin the new version of the folder
ipfs-cluster-ctl add -r folder/

# run garbage collector, to remove the files that are no more pinned
# (remove files, or their old version)
ipfs-cluster-ctl ipfs gc

Thanks for reading!

Illustration from ipfs.io, licensed under CC-BY 3.0