Edit Feb 2014 – after running with the new routing I discovered that after the connection dropped, the routing didn’t work properly. A quick fix is to add the local routing info into to the tunnel-check script after bringing the interface up. I haven’t investigated further yet, but I have updated the instructions to reflect this.
A while back I set up a permanent SSH tunnel between two systems on different sites to allow me to route specific internet traffic through a different external internet connection so that the traffic appears to originate from the remote site. This is useful when the source IP is used for authentication or authorisation in some way, such as for a geo-fenced application. I wrote a previous blog post on how to do that, which is here, and there were a couple of issues with managing the routing that I was never happy with. Yesterday I had cause to configure the tunnel again (using my previous post as guide of course) and this time I was determined to fix the routing.
The first issue was that all internet traffic which originated from the routing system itself went out via the tunnel. For example, any OS updates would be retrieved via the tunnel. This is ‘OK’ but not the behaviour I wanted. In my case, the routing system itself doesn’t need to use the tunnel, it’s just providing a routing service to various systems on its local network. It makes sense to use the local connection as it is faster and clearly incurs less overhead since no SSH tunnel is involved.
The second problem was that one of my scripts used a hard coded IP address in a route for the remote system (to the external tunnel endpoint), which is fine if that address is static, and not so good if it’s dynamic. In my case, while the external tunnel endpoint IP address has been fixed for some time, technically it is dynamic and could change. I use a dynamic DNS provider so that I can always reach the endpoint via a known hostname. Therefore in theory this was easily fixable – I could extracted the current IP for the remote endpoint using the dynamically updated hostname, and used that in the route – I just hadn’t done it yet.
Since I was setting a up a new tunnel anyway, it made sense to revisit the routing and see if I could find a solution to the above issues. This updated post uses what I believe is a more elegant routing solution which seems to resolve both the points above, and doesn’t require me to find the IP of the tunnel endpoint to get the routing right. Read on for how….
The basic principle is that one side of the link will be configured to make an SSH connection to the other, causing a tunnel to be established. Once the tunnel is established, routing will be configured to allow the required traffic to use the tunnel. The system which makes the initial SSH connection will monitor it, and if it drops, re-establish it, including on system start. For the purposes of this post, we’ll use the default SSH port of 22 although I recommend you change this. The commands below should work for a different port by substituting the port number where appropriate, you should be able to figure that out.
We will call the two servers Machine A and Machine B. Machine A is sitting on the internet connection through which I want to route some of the connections from the network Machine B sits on. This is network configuration we’ll be using:
|Machine A – Remote Site|
|Local eth0 IP Address||192.168.0.100|
|Local tun1 endpoint||192.168.0.101|
|Public Globally Routable (External) IP Address||126.96.36.199|
|Machine B – Local Site|
|Local eth0 IP Address||192.168.1.200|
|Local tun1 endpoint||192.168.1.201|
Part 1 – SSH with no passwords
The local system (B) needs to be able to connect automatically to the remote system (A) without prompting for user input, so the first step is to configure SSH to use certificate authentication between the two, rather than password. It’s easiest to do this for the ‘root’ account because setting up the tunnel requires bringing network interfaces up and down, and adjust routing tables. It would be better to use a specific account with appropriate privileges to do this, but that is outside the scope of this post.
In this configuration, Machine A will be the SSH server, with Machine B being the client. Thus Machine B will establish the connection, monitor it, and if it drops, re-connect.
1. Configure a root password if it does not already have one.
2. Check the following lines are in your /etc/ssh/sshd_config file, and are uncommented:
The ClientAlive settings might help recovery in the event of an unexpected connection drop. The settings above cause a ‘ping’ to be sent to an idle client every 30 seconds, and if the client does not respond to 2 pings, the connection will be dropped and cleaned up.
3. If you made changes to the sshd_config file, restart sshd using
|sudo service ssh restart
4. Confirm login as root works from a remote machine using the password.
5. As root, create new keys to be used for authentication. More details here https://help.ubuntu.com/community/SSH/OpenSSH/Keys
chmod 700 ~/.ssh
ssh-keygen -t rsa
For this exercise, the defaults suffice. Just hit enter when asked for the file, and passphrase. Do NOT specify a passphrase.
|Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
6. Copy the Client Key to Machine A using the following command:
|ssh-copy-id “root@machineA -p 22”|
The quotes ” ” are important if you need to specify a port, and there will be a prompt for the root password for Machine A which was configured in step 1.
7. Test this worked by attempting to connect to the remote system:
If everything worked as expected, log in should complete without being prompted for the root password. If it didn’t see the page above for troubleshooting tips.
Do not continue until you have this working.
8. Disable being able to SSH into Machine A as root using the password (i.e. it will only work if the system you are connecting from passes a valid certificate via SSH). Edit the /etc/ssh/sshd_config file and change the PermitRootLogin yes line to:
Just to be clear, this does NOT allow login as root with no password (confusing huh?). It means that you cannot use a password to ssh in as root, you must use another form of authentication, such as the certificate that was previously installed.
9. To test this, try to connect to machine A using ssh and logging in as root, from a different machine (not Machine B). The ‘Access denied’ response should be returned even if when using the correct password.
|login as: root
Part 2 – Configuring the Tunnel
Once Part One is complete, the foundation is in place for automating the SSH tunnel creation. The next step is to configure the tunnel interface, set up the routing, and create a script that will monitor the tunnel and bring it back up if it drops.
10. Enable IPv4 Forwarding. Update /etc/sysctl.conf by adding or uncommenting the following line:
|net.ipv4.ip_forward = 1
To pick up this change without restarting, run the command:
|sysctl -p /etc/sysctl.conf|
Note: This can be done dynamically from the command line using ‘echo 1 > /proc/sys/net/ipv4/ip_forward’, however this will not persist after a restart.
11. Configure IPTables to allow forwarding from the Tunnel to the local network. This will let devices on the local network of Machine B access and go out to the internet via the local network of Machine A. Add the following lines to /etc/rc.local so they get set when the system starts:
|/sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
/sbin/iptables -A FORWARD -i eth0 -o tun1 -m state –state RELATED,ESTABLISHED -j ACCEPT
/sbin/iptables -A FORWARD -i tun1 -o eth0 -j ACCEPT
The first line might not be necessary – it depends on how the gateway on the remote network will handle requests from the 192.168.1.0 network.
12. Configure the tunnel interface in the /etc/network/interfaces file. Machine B will be actually making the connection, so the Machine A configuration is quite simple. Add the following lines to /etc/network/interfaces
|iface tun1 inet static
pre-up sleep 5
up arp -sD 192.168.1.201 eth0 pub
up ip route add 192.168.1.0/24 via 192.168.0.101
This configures the tunnel interface with the correct local IP address as per our table at the top, pointing at the remote tunnel endpoint. The ‘arp’ command allows packets destined for 192.168.1.201 to be routed back to Machine A, so it can forward them to Machine B. There is also a route to the 192.168.1.0/24 network created so that Machine A can reach other machines on the local network of Machine B.
The sleep option is to allow time for the SSH connection to be established.
13. Enable forwarding in the same way as for Machine A, refer to step 10.
14. Create a local routing table to be used by the local system when the tunnel comes up, edit the /etc/iproute2/rt_tables file and add a new table with a unique number, for example:
15. Edit /etc/rc.local to add some local routes to the new table on system start. For example, add the following:
# Create the local routing policy – this uses the already defined route2local table
# Set the routes for the local network and gateway
ip route add 192.168.1.0/24 dev eth0 src 192.168.1.200 table route2localip route add default via 192.168.1.1 table route2local# This system should always route its own packets through the local network.
ip rule add from 192.168.1.200 table route2local
ip rule add from 192.168.1.201 table route2local
# Flush the route cache to immediately apply the change
The lines above add standard routes to the route2local table, then use ip rule add to trigger use of that routing table when a packet is from the listed IP addresses.
16. Configure the tunnel interface in the /etc/network/interfaces file. Since this machine will be actually making the connection, it launches the SSH comand. Add the following lines to /etc/network/interfaces
|iface tun1 inet static
pre-up ssh -f -o compression=no -w 1:1 188.8.131.52 -p22 ‘ifdown tun1; ifup tun1’
pre-up sleep 5
post-up ip route add 192.168.0.0/24 dev tun1 src 192.168.1.201 table route2local
post-up ip rule add from 192.168.1.200 table route2local
post-up ip rule add from 192.168.1.201 table route2local
post-up ip route add 192.168.0.0/24 dev tun1 src 192.168.1.201
post-up ip route del default dev eth0
post-up ip route add default via 192.168.0.1
down ip route del default dev tun1
down ip route add default via 192.168.1.1
This does the following:
Before the tunnel device is brought up, it runs the ssh command with -f (run in background) and -w (create tunnel) to create the tunnel connection. The example also explicitly disables compression. Once connected, it restarts the remote tunnel device to ensure everything is clean.
After the tunnel interface is up, a route to the tunnelled network is added to the ‘local’ routing table, and that ‘local’ routing table is applied to the two local IP addresses used by this system. This means that packets originating from this system and destined for the internet will continue to be routed via the local network, and not the tunnel.
A route is then created in the main routing table, for all other traffic, and the default gateway is changed so that all (non-local) traffic is routed across the tunnel by default.
Note: In theory the ‘down’ commands should be executed when the tunnel is brought down, and this didn’t seem to work on the test system. The default route over the tunnel is deleted, since the tun1 device is no longer up, but the new default route is not added. The script which monitors the connection state will correct the default route if the tunnel does drop.
17. Create a script which will check if the tunnel is up, and if it is not, will start it. The script should be placed somewhere sensible, this example stores it in /opt/scripts so the full path is /opt/scripts/tunnel-check and the content is as follows:
# Ping the remote servers tunnel IP to see if we can see it
ping -c1 $IP_ADRESS >/dev/null 2>&1
RESULT=$?# If ping returned 0 it worked, if it didn’t, reconnect
if [ ! $RESULT -eq 0 ]
echo “SSH Tunnel not running – restarting” # Bring the tunnel interface down so we know it is gone at this end
ifdown tun1# Check for an SSH process used by the tunnel, and kill it if it’s still lurking.
PID=`ps -eo pid,args | grep -v grep | grep “ssh -f -w 1:1 184.108.40.206” | cut -c1-6`
if [ -n “$PID” ]
echo “Killing SSH Tunnel process $PID”
kill -9 $PID
# At this point the tunnel should be down and the SSH session dead.
# Call ifdown again to make sure everything is cleaned up
ifdown tun1 # Ensure there is a sensible default route, and try to bring the tunnel up
ip route del default dev tun1
ip route add default via 192.168.1.1ifup tun1
ip route add 192.168.1.0/24 dev eth0 src 192.168.1.200 table route2localip route add default via 192.168.1.1 table route2local
# This system should always route its own packets through the local network.
# Flush the route cache to immediately apply the change
# Try a ping to see if it worked
The script tries to ping the internal IP address of Machine A, which will go through the tunnel. If there is no response it means the tunnel must be down, so the connection is cleaned up and attempted again. As part of cleaning up, any left over SSH processes connecting to the remote endpoint are killed. If that is not done, it can cause problems trying to reconnect. Once clean up is complete, the tunnel device is brought up and this triggers the SSH connection using the information defined in the /etc/network/interfaces file.
Important: You must ensure that when the tunnel is down, there is a sensible default route, or that there is an explicit route to the external (globally routable) IP address of Machine A, via the local router and not through the tunnel. This is to ensure the packets used to create the tunnel, when it needs to be recreated after an unexpected drop, are routed via the local network router since they obviously can’t use the tunnel if it is down. This is most important for situations like the above, where a default route is used to push all non-local packets through the tunnel, and the tunnel drops.
18. You can now test the tunnel by running /opt/scripts/tunnel-check from the command line (as root). You’ll see some output from the script above, and it should return you to the command prompt. If all has gone to plan, you can now ping 192.168.0.1 from Machine B, and 192.168.1.1 from Machine A
Part 3 – Automating Tunnel Connection and Reconnection
This is the easy bit.
19. Now you have everything working above, you can just add the script to your crontab on Machine B. As root, run crontab -e and add the following line:
|*/1 * * * * /opt/scripts/tunnel-check|
So every minute the script will run, attempt to ping the remote side, and if it’s no longer there will try to reconnect. You should now be able to test that this happens automatically by restarting Machine A or Machine B. When the machine comes back up, the connection should be re-established with no user action required.
20. (Optional) I configured the router on the Machine B network to route requests for the 192.168.0.0/24 network to the local network address of Machine B, 192.168.1.200. This was so that other machines on the Machine B network can access the Machine A network when specifically required, but otherwise continue to use the local network and connection to the internet. So for example, a Machine C on the same network as Machine B could specify a proxy server on the Machine A network in a web browser. This would then make request from that browser on the Machine B network exit to the internet via the proxy server on the Machine A network.
Part 4 – Trouble Shooting
This is mainly a few thoughts on issues I have encountered while doing this.
channel 0: open failed: administratively prohibited: open failed
This was the most irritating issue I hit when trying to bring up the tunnel. What this actually means I have no idea, neither it seems does most of the rest of the internet. It seems you can get it when you haven’t added the PermitTunnel yes line to your sshd_config file on Machine A, but I know for sure you can also get it when you have done that. I have come to the conclusion that it’s related to SSH sessions not being cleaned up correctly. If you see this, try to kill any appropriate SSH processes on both Machines A and B. The ps -ef | grep ssh command is your friend here, and on Machine B it’s much easier to identify the process to kill since you have the command line displayed on the ps output. However, on Machine A, it’s harder since you don’t get a command line and if you have multiple inbound connections using root, then there isn’t much to tell them apart. This is all I get for example:
root 2204 2148 0 10:38 ? 00:00:00 sshd: root
All I can advise here is to use your best guess and maybe look at the time the process was started and see if that matches the time the session on Machine B was started, remembering that there might be differences in time between the systems.
Anyway, once you have cleaned up the SSH sessions, make sure the tunnel devices are down on both systems using ifdown tun1 and then try again.
Using tun0 didn’t work.
As I said in the guide, for some reason I couldn’t get the tunnel to work when using the tun0 devices. This could have been because I hadn’t cleaned up some SSH session as per the above, but in the end I kept getting the old channel 0: open failed: administratively prohibited: open failed error. I eventually gave up and used tun1 which seemed to work. So this could be down to me, and if you hit the same problem it might be worth trying a different tunnel device.
Ironing out the routing issues was a major headache since I’m not a network specialist and know just enough to be dangerous. The instructions above have been changed so they don’t reflect my real IP addresses, and I hope I got all the changes right. It was largely a case of common sense and testing. I’m sure someone more familiar with the intricacies of network routing could improve or clarify my suggestions, but they seem to work so far.
Don’t forget the Forwarding
If you forget the IPv4 forwarding, or the NAT rules on machine A, you will hit issues. Double check it.
Here are some of the pages I used to get this working, thanks to those to who made them available.