Wednesday, May 1, 2013

Python to the Rescure : Who's Downloading Too Much on my Home Network




Introduction

I want to share my passion for simple solutions for problems using Python by taking you through the steps that I’ve gone through to find out who is downloading excessively on my home network which I share with friends. I thought it would be better to buy a high-bandwidth link and share the cost with my friends instead of buying a low-bandwidth link or paying too much money for a high-bandwidth one.

The problem sharing: excessive downloading

I share my link with several friends in my building and each of them uses a switch to extend the network to people in his/her flat.  Sometimes as many as 15 people might be on the network at the same time.  The large number of people isn’t a problem because the link is fast. Problems start to appear only when several guys start running Bit Torrent and downloading MASSIVE files. The question is: How can I determine who’s downloading too much and slowing down the connection without having to call up my friends or visit their houses?
After thinking and researching for a while I managed to cook up a Python Script the produced the following output :



Notice how my Python script can tell me how many hosts are active and sharing the connection with me at anytime. Notice also that this Python script can tell me how much my friend has downloaded and uploaded since I run the script with the Set Up Iptables Traffic rules chain automatically option (option 3) for the frist time. The script can count traffic usage from the time the router starts counting until you delete the rules or reset the iptables counters.

The Journey

Poor router firmware doesn’t support traffic usage per computer

My router being a cheap D-Link low end model has very poor capabilities and so doesn’t have a way to determine how much each computer downloads and uploads. The router works fine and provides me with excellent Internet connectivity, but it can’t tell me how much each user downloads and uploads and who is clogging the network. It seems that the manufacturer didn’t anticipate such a scenario and expected the router to be used only in a single home or flat, which is quite fair!

Ruling out open source firmware due to my router not being supported

There are several open source projects out there that produce firmware for several popular routers. The most famous being DD-WRT firmware and Tomato firmware which both seem to have quite a lot of extensions and added capabilities compared to my current home gateway router. Unfortunately, the router I have is not supported by DD-WRT and Tomato and so I can’t walk down the open source firmware route. Although, it would be very interesting to see how my home router can run firmware that will give it features available in routers 10 times its price.

Does my router support telnet?

I switched on my Ubuntu virtual machine and ran a quick scan on my router.
nmap –F 192.168.1.1
and this is what I found :

Should I use SSH?

It turns out that my router runs both a telnet service and an SSH service. Almost nobody uses telnet these days because of the inherent security problems that arise when sending login information over a link without encryption. I thought for a while, am I going to ever telnet to my home router from work or from another flat? No. Is there anybody waiting to attack my router? Probably, not. Then it would be safe to use Telnet because it is easier to set up. You just login with a user name and password and start sending commands and getting back results. I might upgrade my script to use SSH in the future just to make it more secure and in accordance with security best practices.

What is Telnet

Telnet is a network protocol(set of rules for communicating over a network) and a program that provides access to a command-line interface on a remote host. Most network equipment with a TCP/IP stack support a Telnet service for remote configuration. A server runs a telnet service and waits for clients to connect. Clients issue commands and the server executes them
What this means in simple English
Telnet is a service, i.e. a program that computers run that sits around waiting for a user to connect and start entering commands. Each command is simply a word or group of words that follows certain rules. The command is sent to the computer, which executes it and the results sent back to the user.  Network equipment such as home gateway routers usually have a telnet service running to allow users to configure the network equipment without being directly in front of it. You’ll need a user name and password in order to log in to the router to execute commands via telnet. Chances are your home gateway router has a telnet service running.

How this will help us know the traffic usage

If I can Telnet to the router and start issuing commands I can tell the router to keep track of traffic sent to / from each computer on my LAN. I can then issue other commands to know how much bandwidth each user is consuming. But first let us explore the router and see how it works internally…

 

My Home Router Uses Linux Internally

After doing a bit of research online I discovered that my home router runs Linux and uses BusyBox which is especially designed for embedded systems with very low resources. A lot of Linux commands are available such as cp, cat,  chmod, echo, rm etc. The router then has its own commands such as xdslctl, wlmngr,  etc.
Proof that my router internally runs linux

It turns out that my router has the iptables command available. iptables is a command used to manipulate tables used by the Linux kernel firewall. It consists of rules that are combined together in chains.
I thought why not add a user defined chain that counts the number of packets that each user uploads/downloads. So with a couple of iptables commands on my router I can get an idea about how much each user is taxing the network.
What is an ARP Table
When computers that use Ethernet connect together they communicate using Ethernet frames that have a source mac and a destination mac. However computers that run TCP/IP require IP addresses and not mac addresses, a table has to be constructed to associate a specific IP address with a certain MAC address. Ethernet provides the technology that runs within your local network, but you need to communicate with hosts outside it. In a sense IP uses Ethernet to move around. Think of it this way, IP needs Ethernet as a basis to get its job done over your network.
What matters to us is that the router and every other device in the network will probably have an Address Resolution Protocol table that will give us an idea what hosts are actively communicating. The routers ARP Table will have the IPs and MAC addresses of each active host on the network.
After playing around with the command line I figured out that the following commands displays the routers ARP table:
>lanhosts show all

As you can see the table shows MAC addresses (first column) associated with ip addresses.

How my script works

So I designed a script that telnets to the router , issues iptables commands and get back the results and breaks the words on each line and learns that information
I created a function in my script called SetUpTrafficRules  which creates new user defined chain called TRAFFIC and fills it with rules that match packets destined to each known host on my network that is learned from the routers ARP table.
iptables -N TRAFFIC #create user defined chain traffic
I created a user defined chain
iptables  -I FORWARD –j TRAFFIC
I created a “jump” from the FORWARD chain to my traffic chain
iptables  -I TRAFFIC –d 192.168.1.2
iptables  -I TRAFFIC –d 192.168.1.3
iptables  -I TRAFFIC –d 192.168.1.4
iptables –I TRAFFIC –d 192.168.1.6
iptables –I TRAFFIC –d 192.168.1.7
iptables –I TRAFFIC –d 192.168.1.10

Now, all I need it for my Python script to telnet to my router and execute a
iptables –L TRAFFIC –v –x.
Now I can display the total amount of traffic that each of these hosts download or upload. Lines that have the word anywhere in the source column and an IP in the destination column specify how much that IP has downloaded.
The command is a simple, it just Lists (-L) in verbose and detailed way (-v ) the number of bytes and packets that match the rule I’ve specified.  The rule simply says match all traffic that comes from anywhere and is destined for IP x or IP y that is taken from the routers ARP Table.
Now this will keep counting the number of packets and bytes until you zero the chain or the router restarts.

Code Walkthrough

The program I’ve written consists of a Python module that contains the functionality of the program such as

·         ExecuteCommands : Logins to a router and executes a list of commands
·         GetRouterARPTable : Returns a dictionary of macs mapped to ips from routers “arp show” command
·         SetUpTrafficRules : Connects to the router and issues iptable commands that create a TRAFFIC chain with rules for each active host found by GetRouterARPTable.
·         PrintTrafficUsage : Heart of the program, uses other functions to get its work done.
·         RemoveTrafficRules : Flushes the TRAFFIC user defined iptables chain
·         GetTrafficData : Returns a list of ip, u or d (upload, download), and traffic amount

The Driver


from RouterUtils import *


def show_help():
    print "---------WELCOME TO TRAFFIC FLOW ESTIMATOR------------"
    print "------------------------------------------------------"
    print "[1] How many people are connected to the router now ? "
    print "[2] Traffic Usage Estimate "
    print "[3] Set Up Iptables Traffic rules chain automatically"
    print "[4] Flush Iptables ,in TRAFFIC chain now"
   
choice = ""
while choice <> "exit":
    choice = raw_input(" : ")
    if choice == "help":
        show_help()
    elif choice == "2"  :
        PrintTrafficUsage()
    elif choice == "3" :
        SetUpTrafficRules()
    elif choice == "4" :
        RemoveTrafficRules()
    elif choice == "1":
        ip2hostname = {}
        mac2ip = GetRouterARPTable()
        ip2hostname = GetIp2HostnameD()
        print "Router says there are ", len(mac2ip), " active computers/devices"
       
        print "%15s" % "Name",
        print "%15s" % "IP",
        print "%15s" % "MAC"
        for mac in mac2ip.keys():
            print "%15s" % ip2hostname[mac2ip[mac]],
            print "%15s" % mac2ip[mac],
            print "%15s" % mac

RouterUtils: The meat of the program


import telnetlib

HOST = "192.168.1.1"
USER = "admin"
PASSWORD = "admin"

"""
Telnets to the specified *host* and logs in with the provided user name and password
It then issues a command "arp show" and gets back the output.
It then goes through each line and splits it into fields.
If 6 fields are found, it takes the first field i.e. IP and 4th fieldd i.e. HW Address/Mac
Finally: It returns a dictionary of macs and their associated IPs.

for k in mac2ip.keys():
    print k, mac2ip[k]

Note: This has been tested with a DLink 2730U Router, output might be slightly different depending on the router
firmware. Or it might be consistent, I can't guarantee this.

"""
def GetRouterARPTable(host=HOST, user=USER,password=PASSWORD):
    tn = telnetlib.Telnet(host)
   
    tn.read_until("Login: ")
    tn.write(user + "\n")
    tn.read_until("Password: ")
    tn.write(password + "\n")
   
    tn.write("arp show\n")
    tn.write("exit\n")
   
    output =  tn.read_all()
    #print "output : ", output
   
    mac2ip = {}
   
   
    for line in output.split("\r\n"): #\r\n because this is a linux machine
        fields = line.split()
       
        if len(fields) == 6 and fields[0] <> "Bye":    #IP address       HW type     Flags       HW address            Mask     Device
            #print fields[0], "is at ", fields[3]
            mac2ip[fields[3]] = fields[0] #[3] mac, [0] IP
        #else:
            #print "NO MAC FOUND on this line : " , line
       
    tn.close()
   
    return mac2ip
#################################################################################
def GetTrafficData():
    orders = []
    orders.append("iptables -L TRAFFIC -v -x \n")
    output = ExecuteCommands(orders)
    #print output
    updown_info = []
    r = output
    if r == "" or r == None:
        print "No Traffic data available. Router could be down or cable disconnected"
        return
   
      
    for line in r.split("\r\n"):
        upload , download = 0, 0 # reset for new line
        fields = line.split()
        if len(fields) == 8 and fields[0] <> "pkts":    #pkts      bytes target     prot opt in     out     source               destination   
            if fields[6] == "anywhere": #source is anywhere therefore i.e. download rule
                download = fields[1]
                ip = fields[7]
                updown_info.append([ ip, 'd',  download])
            elif fields[7] == "anywhere": #destination is anywhere, therefore upload line
                upload = fields[1]
                ip = fields[6]
                updown_info.append([ ip, 'u', upload])
               
    return updown_info

   
#Chain TRAFFIC (1 references)
#    [0]       [1]     [SKIP]    [2] [3] [4]   [5]        [[[6]]]           [ [[7]]]
#    pkts      bytes target     prot opt in     out     source               destination        
#      10      819            all  --  ppp0   any     anywhere             192.168.1.2        
#       0        0            all  --  any    any     192.168.1.2          anywhere      
#N.B. TARGET COLUMN IS ALWAYS EMPTY IN THIS VERSION OF TRAFFIC RULES


#####################################################################################
def ExecuteCommands(order, host=HOST, user=USER,password=PASSWORD):
    try:
   
        tn = telnetlib.Telnet(host)
       
           
       
        tn.read_until("Login: ")
        tn.write(user + "\n")
        tn.read_until("Password: ")
        tn.write(password + "\n")
       
        for cmd in order:
            tn.write(cmd + "\n")
            tn.read_until(">",2)
            #print "Router Command Executed : " + cmd
           
           
        tn.write("exit" + "\n")
        return tn.read_all()
   
    except:
        print "Unexpected Error : check that router is powered on and lan cable is plugged in"
       
  
############################################################
def SetUpTrafficRules():
    print "-----------------DETECTING NEIGHBOURS----------------------"
    print "Finding out hosts that are active in the routers ARP Table:"
    active_hosts = GetRouterARPTable() #GEt list of ips and mac addresses
    for k in active_hosts.keys():
        print "Host Detected : ", k, " == ", active_hosts[k]
  
 
    nhosts = len(active_hosts)
   
    print "Router says" , nhosts, " hosts are present"
    print "-----------------------------------------------------"
  
    #Too dangerous to setup programmatically
    #order1 = []
    #order1.append("iptables -N TRAFFIC")
    #order1.append("iptables -I FORWARD -j TRAFFIC")
    #ExecuteCommands(order1)
   
    order2 = []
    print "---SETTING UP USER DEFINED CHAIN : [[TRAFFIC]] with rules for each host----"
    for k in active_hosts.keys():#NB:Dictionary Keys are MACs, dictionary values are IPs
        ip = active_hosts[k]
        cmd1 = "iptables -A TRAFFIC -i ppp0 -d " + ip
        cmd2 = "iptables -A TRAFFIC -s " + ip + " -o ppp0"
        order2.append(cmd1)
        order2.append(cmd2)
        ExecuteCommands(order2)
        order2 = [] #zero commands between executions
       
   
    #order3 = []
    #cmd1 = "iptables -L TRAFFIC -v \n"
    #order3.append(cmd1)
    #print "Querying the routers TRAFFIC iptables chain returned : "
    #print ExecuteCommands(order3)
    #print "###############################################################"
       
     
#REMOVE RULES MANUALLY, NEVER HERE, OR A BUG MIGHT DESTROY YOUR INTERNET ACCESS
def RemoveTrafficRules():
    orders = []
    #I WILL NEVER MESS WITH THE FORWARD TABLE OR I MIGHT F*** THE ROUTER AND WON'T BE ABLE TO LOG IN
    orders.append("iptables -F TRAFFIC")
    #orders.append("iptables -X TRAFFIC")
    ExecuteCommands(orders)
   
   # print "[x]Reference to user defined TRAFFIC chain in the FORWARD chain has been deleted"
    print "[x]Rules in the TRAFFIC UD chain have been flushed."
   # print "[x]The UD chain TRAFFIC has been removed."
   
def GetIp2HostnameD():
    order = []
    ip2hostname = {}
    order.append("lanhosts show all")
    output = ExecuteCommands(order)
    for line in output.split("\r\n"):
        fields = line.split()
        if len(fields) == 4 and fields[3] <> "":
            ip2hostname[fields[1]] = fields[3]
        if len(fields) == 3: #someone has an empty name
            ip2hostname[fields[1]] = "{NONE}"
       
    return ip2hostname

def PrintTrafficUsage():
    data = GetTrafficData() #[ip, 'u/d', n]
    total_down = 0
    total_up = 0
    if data == None:
        print 'NO DATA AVAILABLE, THIS COULD BE CAUSED BY A DISCONNECTED CABLE'
        return
   
    for l in data:
        if l[1] == 'd':
            total_down += int(l[2])
        elif l[1] == 'u':
            total_up += int(l[2])
           
    ip2name = GetIp2HostnameD()
    #IP, 'd', n
    
    try:
        for l in data:
            if l[1] == 'd' and l[2] <> "0":
                print "%30s" % ip2name[l[0]],
                print "%4d" % round((float(l[2])/total_down)*100,4),
                print "%",
                print "%4s" % "DOWN",
                print "%8d" % round(float(l[2]),4)
            elif l[1] == 'u' and l[2] <> "0":
                print "%30s" % ip2name[l[0]],
                print "%4d" % round((float(l[2])/total_up)*100,4),
                print "%",
                print "%4s"  % "UP",
                print "%8d" % round(float(l[2]),4)
               
    except KeyError:
        print "Someone has a Hostname set to space, this has tripped up the parsing"
       
############################################################