*Update 29/12/2013* – I’ve written a new post on this topic with updated instructions here. I’m leaving this page as is for posterity, and I recommend you use the new post since it has (I think) better handling of routing.

I run two servers in two different locations which are connected via a permanent OpenSSH VPN, complete with routing between the two. The servers both sit on independent internet connections and I’ve implemented it so that if one side or the other drops, the connection will be automatically re-established. This means I can have some devices which route through the secure connection and access the internet through the remote end point, which can be quite useful in some situations.

Last week, one of the servers suffered severe disk corruption when power failed (no, these boxes aren’t important enough to run on a UPS), and I had to rebuild it from scratch and re-configure the permanent VPN from scratch. This was a bit of a pain the first time, and just as much the second, so I have decided to document it for posterity and (if nothing else)  future reference.

We will call the two servers Machine A and Machine B. Machine A is sitting on the internet connection through which I want to route some of the connections from the network Machine B sits on.

The basic principle is that one side of the link will be configured to make an SSH connection to the other, causing a tunnel to be established, and routing will then be configured to use the tunnel. The system which makes the initial SSH connection will monitor it, and if it drops, re-establish it, including on system start. For the purposes of this post,  we’ll use the default SSH port of 22 although I recommend you change this. The commands below should work for a different port by substituting the port number where appropriate, you should be able to figure that out.

Before we get started, this is network configuration we’ll be using:

Machine A
Local network  192.168.0.0/24
Local eth0 IP Address 192.168.0.100
Netmask 255.255.255.0
Gateway 192.168.0.1
Local tun1 endpoint 192.168.0.101
Public Globally Routable (External) IP Address 1.2.3.4
Machine B
 Local network  192.168.1.0/24
Local eth0 IP Address 192.168.1.200
Netmask 255.255.255.0
Gateway 192.168.1.1
Local tun1 endpoint  192.168.1.201

 

Part 1 – SSH with no passwords

We need to have SSH reconnect automatically without prompting for user input, so we need to configure SSH to use a certificate for authentication rather than a password. We’re going to do this for the ‘root’ account because when setting up the tunnel we’re going to be required to bring network interfaces up and down, and adjust routing tables. The more experienced among us could probably make this work without having to use root, but that is outside the scope of this post.

In this configuration, Machine A will be the SSH server, with Machine B being the client. Thus Machine B will establish the connection, monitor it, and if it drops, re-connect. So we need to configure the following.

Machine A

1. Configure a root password if it does not already have one.

2. Check the following lines are in your /etc/ssh/sshd_config file, and are uncommented:

PermitRootLogin yes
PubkeyAuthentication yes

RSAAuthentication yes
ClientAliveInterval 30

ClientAliveCountMax 2

I suggest adding the ClientAlive settings since it might help recovery in the event of an unexpected connection drop.  The settimgs here say that if an idle client has not been responsive to a ‘ping’ within 1 minute, the connection will be dropped and cleaned up.

3. If you made changes to the sshd_config file, restart sshd using

sudo service ssh restart

4. Confirm you can login as root from a remote machine using the password.

Machine B

5. As root, create new keys to be used for authentication. More details here https://help.ubuntu.com/community/SSH/OpenSSH/Keys

mkdir ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t rsa

For our purposes the defaults will suffice, so just hit enter when asked for the file, and passphrase. Do NOT specify a passphrase.

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.

6. Copy the Client Key to Machine A using the following command:

ssh-copy-id “root@machineA -p 22”

The quotes ” ” are important if you need to specify a port, and you will be prompted for the root password for Machine A which you configured in step 1.

7. Test this worked by attempting to connect:

ssh root@machineA

If it all worked, you should be logged in without being prompted for the root password. If it didn’t see the page above for troubleshooting tips.
Do not continue until you have this working.

Machine A

8. Disable being able to SSH into Machine A as root using the password (i.e. it will only work if the system you are connecting from has the key generated in step 5 installed). Edit the /etc/ssh/sshd_config file and change the PermitRootLogin yes line to:

PermitRootLogin without-password

Just to be clear, this does NOT allow you to login as root with no password (confusing huh?). It means that you cannot use a password to ssh in as root, you must use other authentication, such as the certificate we just installed.

9. If possible from a different system which is not machine B, try to connect to machine A using ssh and logging in as root. You should receive ‘Access denied’ even if you use the right password.

login as: root
root@machineA’s password:
Access denied

 

Part 2 – Configuring the Tunnel

In this section we will configure the interfaces used to define the tunnel, and create a script that will configure the appropriate routing so that you can ping the remote endpoint using the internal IP address. We’ll also do some required steps to make the routing and packet forwarding work.

Machine A

10. Enable IPv4 Forwarding. While you can do this dynamically from the command line using ‘echo 1 > /proc/sys/net/ipv4/ip_forward’, this doesn’t persist through a reboot so I suggest updating /etc/sysctl.conf which then persists the option. Add, or uncomment the following line:

net.ipv4.ip_forward = 1

To pick up this change without restarting, run the command:

sysctl -p /etc/sysctl.conf

11. Configure IPTables to allow forwarding from the Tunnel to the local network. This will let devices on the local network of Machine B access and go out to the internet via the local network of Machine A. Add the following lines to /etc/rc.local so they get set when the system starts:

/sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
/sbin/iptables -A FORWARD -i eth0 -o tun1 -m state –state RELATED,ESTABLISHED -j ACCEPT
/sbin/iptables -A FORWARD -i tun1 -o eth0 -j ACCEPT

Thinking about it, the first line might not be necessary – it depends on how the router for the local network will handle requests from the 192.168.1.0 network. I’ll have to test it.

12. Configure the tunnel interface in the /etc/network/interfaces file. Machine B will be actually making the connection, so the Machine A configuration is quite simple. Add the following lines to /etc/network/interfaces

iface tun1 inet static
         pre-up sleep 5
         address 192.168.0.101
         pointopoint 192.168.1.201
         netmask 255.255.255.0
         up arp -sD 192.168.1.201 eth0 pub
         up ip route add 192.168.1.0/24 via 192.168.0.101

This configures the tunnel interface with the correct local IP address as per our table at the top, pointing at the remote tunnel endpoint. The ‘arp’ command allows packets destined for 192.168.1.201 to be routed back to Machine A, so it can forward them to Machine B. I also added a route to the 192.168.1.0/24 network so that Machine A can reach other machines on the other side of the tunnel. I had some trouble with getting ping responses back during testing and adding this seemed sensible.

The sleep option is to allow time for the SSH connection to be established.

Machine B

13. Repeat step 10 for Machine B

14. Configure the tunnel interface in the /etc/network/interfaces file. Since this machine will be actually making the connection, it launches the SSH comand. Add the following lines to /etc/network/interfaces

iface tun1 inet static
        pre-up ssh -f -w 1:1 1.2.3.4 -p22 ‘ifdown tun1; ifup tun1’
        pre-up sleep 5
        address 192.168.1.201
        pointopoint 192.168.0.101
        netmask 255.255.255.0

So before the tunnel device is brought up, it runs the ssh command with -f (run in background) and -w (create tunnel) to create the tunnel connection. Once connected, it restarts the remote tunnel device for good measure.
You’ll notice there are no routes configured when the device comes up – this is because I handle that in a separate script. I had issues getting everything to work right when trying to configure the routes when the interface goes up and down, and since we need a script to monitor and reconnect if the connection drops, I decided to handle the routing updates in that script also.

15. Create a script which will check if the tunnel is up, and if not, start it, and configure the routes. I created this in /opt/scripts so the full path for my script is /opt/scripts/tunnel-check and the content is as follows:

#!/bin/bash
# Ping the remote servers local IP to see if we can see it through the tunnel
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin
IP_ADRESS=192.168.0.100
ping -c1 $IP_ADRESS >/dev/null 2>&1
RESULT=$? 

# If ping returned 0 it worked, so if it didn’t reconnect
if [ ! $RESULT -eq 0 ]
then
  echo “SSH Tunnel not running – restarting”
  # Bring the tunnel interface down so we know it is gone at this end
  ifdown tun1  # Check for an SSH process used by the tunnel, and kill it if it’s still lurking.
  PID=`ps -eo pid,args | grep -v grep | grep “ssh -f -w 1:1 1.2.3.4” | cut -c1-6`
  echo $PID
  if [ -n “$PID” ]
  then
    echo “Killing SSH Tunnel process $PID”
    kill -9 $PID
    sleep 5s
  fi
  ifdown tun1
# At this point the tunnel should be down and the SSH session dead.

  # Bring the if back up and re-establish the routes
# Ensure the direct route to the remote endpoint is in place

  ip route add 1.2.3.4/32 via 192.168.1.1
  ifup tun1
  ip route add 192.168.0.1/32 dev tun1
  ip route add 192.168.0.0/24 via 192.168.0.101
  ip route add default via 192.168.0.1
fi

The script tries to ping the internal IP address of Machine A, which will go through the tunnel. If there is no response it means the tunnel must be down, so we need to clean up and connect again. As part of that, we check for left over SSH processes and kill them. If we don’t do that, it can cause problems trying to reconnect. Once that clean up is complete, we bring the the tunnel device up on this end, which will trigger the SSH connection using the info we defined in the /etc/network/interfaces file.

Important: You must define the route to the external (globally routable) IP address of Machine A to go out through the local router, and not through the tunnel. This is to ensure the packets used to create the tunnel go out via the local network router, since they obviously can’t use the tunnel (which isn’t there yet). This is most important for situations where you have created a default route (as the example above does) to push all packets through the tunnel, and if the tunnel drops, you need to make sure it can be recreated.

16. You can now test the tunnel by running /opt/scripts/tunnel-check from the command line (as root). You’ll see some output from the script above, and it should return you to the command prompt. If all has gone to plan, you can now ping 192.168.0.1 from Machine B, and 192.168.1.1 from Machine A

Part 3 – Automating Tunnel Connection and Reconnection

This is the easy bit.

17. Now you have everything working above, you can just add the script to your crontab on Machine B. As root, run crontab -e and add the following line:

*/1  *  *   *   *   /opt/scripts/tunnel-check

So every minute the script will run, attempt to ping the remote side, and if it’s no longer there will try to reconnect. You should now be able to test that this happens automatically by restarting Machine A or Machine B. When the machine comes back up, the connection should be re-established with no user action required.

18. (Optional) I configured the router on the Machine B network to route requests for the 192.168.0.0/24 network to the local network address of Machine B, 192.168.1.200. This was so that other machines on the Machine B network can access the Machine A network when specifically required, but otherwise continue to use the local network and connection to the internet. So for example, a Machine C on the same network as Machine B could specify a proxy server in a web browser configuration which actually sits on the Machine A network. This would then make request from that browser on the Machine B network exit to the internet via the Machine A network which can be useful, say for ensuring consistent geo-location identification.

 

Part 4 – Trouble Shooting

This is mainly a few thoughts on issues I have encountered while doing this.

channel 0: open failed: administratively prohibited: open failed

This was the most irritating issue I hit when trying to bring up the tunnel. What this actually means I have no idea, neither it seems does most of the rest of the internet. It seems you can get it when you haven’t added the PermitTunnel yes line to your sshd_config file on Machine A, but I know for sure you can also get it when you have done that. I have come to the conclusion that it’s related to SSH sessions not being cleaned up correctly. If you see this, try to kill any appropriate SSH processes on both Machines A and B. The ps -ef | grep ssh command is your friend here, and on Machine B it’s much easier to identify the process to kill since you have the command line displayed on the ps output. However, on Machine A, it’s harder since you don’t get a command line and if you have multiple inbound connections using root, then there isn’t much to tell them apart. This is all I get for example:
 root      2204  2148  0 10:38 ?        00:00:00 sshd: root

All I can advise here is to use your best guess and maybe look at the time the process was started and see if that matches the time the session on Machine B was started, remembering that there might be differences in time between the systems.

Anyway, once you have cleaned up the SSH sessions, make sure the tunnel devices are down on both systems using ifdown tun1 and then try again.

 

Using tun0 didn’t work.

As I said in the guide, for some reason I couldn’t get the tunnel to work when using the tun0 devices. This could have been because I hadn’t cleaned up some SSH session as per the above, but in the end I kept getting the old channel 0: open failed: administratively prohibited: open failed error. I eventually gave up and used tun1 which seemed to work. So this could be down to me, and if you hit the same problem it might be worth trying a different tunnel device.

 

Routing Issues

Ironing out the routing issues was a major headache since I’m not a network specialist and know just enough to be dangerous. The instructions above have been changed so they don’t reflect my real IP addresses, and I hope I got all the changes right. It was largely a case of common sense and testing. I’m sure someone more familiar with the intricacies of network routing could improve or clarify my suggestions, but they seem to work so far.

 

Don’t forget the Forwarding

If you forget the IPv4 forwarding, or the NAT rules on machine A, you will hit issues. Double check it.

 

Useful Pages

Here are some of the pages I used to get this working, thanks to those to who made them available.

https://help.ubuntu.com/community/SSH/OpenSSH/Keys
http://www.debian-administration.org/articles/152

https://help.ubuntu.com/community/SSH_VPN/
http://www.revsys.com/writings/quicktips/nat.html
http://www.ducea.com/2006/08/01/how-to-enable-ip-forwarding-in-linux/