Category Archives: System Administration

Ansible & Cisco – Automating configuration management.

Configuring network equipment has always been somewhat of a tedious affair. Copy and pasting a configuration file through the console port doesn’t scale (or you need a lot of interns!) and other solutions like Cisco Prime are slightly overkill if you only want to change a few lines of configuration.

This is where Ansible comes to the rescue. Originally built as a Linux Configuration Management Tool, in the vein of Chef/Puppet/Salt, it’s built around an SSH agent-less push model. Directly using the SSH connection, it’s remotely executing command you define in “Playbooks” This is why it’s a great fit when it comes to network devices as those are still telnet/SSH based. API/Netconf are starting to be more and more common, but SSH is still present. Especially on older network equipment.

Recently, Ansible has been augmented with a series of module that allow a network operator to leverage Ansible to deploy configuration to remote equipment. For example, if you just realized that your template is mysteriously missing NTP, Syslog and SNMP configuration and there are about 40 pieces of equipment deployed – I’m not saying this totally happened to us – Ansible is here to help.

The following playbook does the following :

  • Define a role : cisco-ios-common – which holds all the default configuration that is used by our devices.
  • In our case, the SSH connection is initiated from the sysadmin computer running the playbook – but it can be adapted to run from a bastion host.
  • No need to gather facts.
  • Create the variable holding the credentials necessary for accessing the equipment.

The actual file with the tasks is fairly simple.

This is literally lines that are parsed from the “show running-config” command and compared to each task. If it’s not there, it’s added or changed . In this case, we configure the RO SNMP community and the NTP server. To be extra careful, we also take a backup of the running-config before applying the changes. Using the “provider” statement, we are referencing the variables defined in the playbook file. All other private variables can be stored in an Ansible vault file and encrypted.

With the appropriate inventory file I was able to quickly fix our deployment mistake without manually connecting to every single switch.

Monitoring @ Lan ETS

Monitoring at an event of our scale is a critical part of our design process. The different monitoring services are our eyes and ears and allow us to quickly understand the state of our network. Since our event is over a very short period of time, our goal is to have a smooth experience for the player, from the moment he plugs his computer into our network. There is no worst feeling than looking at a row of players getting off their seats and being unsure of it’s a network issue!

Here is a brief overview of the different services we use to monitor our network.


LibreNMS

A open source fork of Observium, it’s designed as a plug and play monitoring platform. You simply point it to a SNMP aware device and it will do it’s best to associate common OIDS and MIBS to “automagically” monitor the target. It then polls the OIDS every 5 minutes and updates the corresponding RRD files. It is also possible to export the data into a more modern timeseries database named InfluxDB. This allows us to access the data in a more flexible way using other services that are not compatible with the RRD format.

That said, it’s not a perfect platform. Since it’s designed around auto discovering devices, it can only detect whatever OIDS the developers configured for each platform. This means that if you are running very recent gear or equipment from obscure manufacturers, you might be missing a lot of information from the auto discovery process. There is currently no easy way to add extra OIDS to your installation.

Another issue is the polling interval as the tool was initially built around a 5 minutes window. This is acceptable in most environments but for our event, we need something that gives us almost real-time data on the state of the network. This is why we used Shinken to provide the actual Up/Down status. We rely on LibreNMS for graph and performance data.

Shinken

Shinken is built as a Nagios drop-in replacement. We use it with a very short polling rate of 10 seconds with ICMP checks. We monitor only our switches, routers and servers. With around 100 or so pieces of equipment, it does tax our server resources, but it allows us for near real-time awareness of the state of our network. Within 10 seconds, we can know if a player kicked the power cord from a switch and if any other equipment is unreachable.

Nagvis

Nagvis was initially built as a mapping tool for Nagios. But it supports a wide variety of different back-ends as ways to fetch data. We use it in conjunction with the “Livestatus” module for Shinken. We upload a map of the venue with all of our equipment displayed. This allows us to quickly know if everything is up and running and at the same time, react if anything turns red.

Oxidized

Oxidized is a modern replacement for Rancid. It’s a network equipment configuration backup tool. It logs into a piece of equipment using credentials you provided and runs any commands you want. In the case of Cisco based gear, a “enable” followed by a “show run”. It then stores the results in a git based structure and provides a web interface to compare different backups. It allows us to backup our configurations throughout our design process. We usually don’t run it during our event as we do not change (hopefully…) any configurations during the weekend.

Grafana

We have just recently started using Grafana as a visualization tool for our performance data. It uses a InfluxDB data-source to display data in a much more friendly and dynamic way than the RRD format. It’s an excellent tool to display interface stats and traffic.

Splunk

Splunk is our log aggregation tool. We use it as a central collection point for network equipment and server logs. We use it to create dashboards to highlight important events such as port-security events or unauthorized login attempts against our network. By default, most equipment logging can be very verbose and it’s important to properly filter events to pinpoint what is important.

Raspberry Pi 2 – NTP server | Stratum 1

Well, it’s done! I have a small RP2 acting as a Stratum 1 NTP Server using a GPS module with the GPIO input.

It’s been added the canadian pool.ntp.org cluster of NTP servers and has begun answering queries. The process was relatively simple with a few things to keep in mind :

  • The default repository do not hold a recent image of the NTPD image. You are better off downloading and compiling the archive yourselves.
  • Here is the GPS module for the Raspberry Pi A+/B+/2 : Link
  • If you have an older Raspberry Pi : Link
  • You might need an antenna to lock on to the GPS signal. Especially true if you are indoors. Depending if a GPS repeater is present, it might be impossible to get a lock anywhere else than the top floor of a building or a room with windows.
  • I made the mistake of ordering the old style of GPS module for the RP2. It does fit, but it’s a bit awkward and I need to get a GPIO riser so that the antenna isn’t hitting the USB/Ethernet connectors. For now, the GPS module is just not fully connected to the GPIO board. There might be 3 centimeters of the connectors exposed. Everything seems to be working fine but doesn’t look very clean.

Here is a small taste of what kind of devices seem to be hitting my server :

tcpdump_ntp

 

stjhnbsu1kw-047055188012.dhcp-dynamic.FibreOp.nb.bellaliant.net
hlfxns016cw-156057136226.dhcp-dynamic.FibreOp.ns.bellaliant.net
216-211-57-239.dynamic.tbaytel.net
hlfxns016cw-156057150054.dhcp-dynamic.FibreOp.ns.bellaliant.net
hlfxns0187w-047055119103.dhcp-dynamic.FibreOp.ns.bellaliant.net
hlfxns0187w-142177064089.pppoe-dynamic.High-Speed.ns.bellaliant.net
stjhnbsu1kw-047054246090.dhcp-dynamic.FibreOP.nb.bellaliant.net
hlfxns0188w-099192087052.pppoe-dynamic.High-Speed.ns.bellaliant.net
dsl.198.58.171.47.ebox.ca
stjhnbsu0ww-142134156121.dhcp-dynamic.FibreOP.nb.bellaliant.net
192-0-170-198.cpe.teksavvy.com
216-211-115-4.dynamic.tbaytel.net
hlfxns0187w-047055097081.dhcp-dynamic.FibreOp.ns.bellaliant.net
fctnnbsc38w-207179184024.dhcp-dynamic.FibreOp.nb.bellaliant.net
hlfxns0169w-142068218193.pppoe-dynamic.High-Speed.ns.bellaliant.net
stjhnbsu1kw-047055177028.dhcp-dynamic.FibreOp.nb.bellaliant.net
24.114.221.2
S010674d02b6711ee.ca.shawcable.net
216-211-71-83.dynamic.tbaytel.net
hlfxns0187w-047055122041.dhcp-dynamic.FibreOp.ns.bellaliant.net
stjhnbsu1kw-099192014064.dhcp-dynamic.FibreOp.nb.bellaliant.net
stjhnbsu0nw-156034190005.dhcp-dynamic.FibreOp.nb.bellaliant.net
stjhnbsu1kw-047055179184.dhcp-dynamic.FibreOp.nb.bellaliant.net
HLFXNS016CW-142134092016.dhcp-dynamic.FibreOp.ns.bellaliant.net
stjhnbsu0ww-142134158198.dhcp-dynamic.FibreOP.nb.bellaliant.net
184.66.68.82
hlfxns0163w-142068001198.dhcp-dynamic.FibreOp.ns.bellaliant.net
stjhnbsu0ww-142134152181.dhcp-dynamic.FibreOP.nb.bellaliant.net
hlfxns016cw-156034028230.dhcp-dynamic.FibreOP.ns.bellaliant.net
dsl.198.58.150.186.ebox.ca
stjhnbsu0ww-047054187034.dhcp-dynamic.FibreOP.nb.bellaliant.net
173.239.175.242
stjhnbsu1kw-047054018211.dhcp-dynamic.FibreOP.nb.bellaliant.net
stjhnbsu1kw-047055183093.dhcp-dynamic.FibreOp.nb.bellaliant.net
stjhnbsu1kw-047054018159.dhcp-dynamic.FibreOP.nb.bellaliant.net
199.168.250.156 (ip-199.168.250.156.reverse.skycomp.ca)
216-211-95-251.dynamic.tbaytel.net
216-211-44-86.dynamic.tbaytel.net

That’s a lot of cable modem/CPE devices. I wonder why ISP are not using internal servers. Seems like time is something you would want full control over.

 

Centos 6.5 – Monitoring Bind9 with Bindgraph.

Bindgraph is a program based on Mailgraph and offers the same basic features. It reads the dns_queries.log file and creates a .rrd file listing the number of queries per record type (A, AAA, CNAME, PTR and a few others). It then generates a graph, takes a screenshot and creates a nice HTML page with a .cgi script.

It provides quick overview of your dns traffic. Though, it is limited to the number and type of queries and does not handle where they come from.

Day View

2014-10-12 18_05_00-DNS Statistics for mail.coldnorthadmin.com

Month View

2014-10-12 18_05_50-DNS Statistics for mail.coldnorthadmin.com

 

1) Launching the daemon :

 

CentOS 6.5 – Private key authentication for ssh

SSH is secure! I don’t need anything else!

Well, you are not exactly wrong. But security is not something that should be taken lightly.

I’ve recently acquired a Linode host and I was stunned by the number of unauthorized login attempts. About 99% of those attempts were probably automated scripts crawling the Internet for anything responding to queries on port 22. I believe that a huge part of modern “hacking” is entirely automated which reinforces the perspective that security is a 24/7 concern.

Since SSH is your main entry point to control your machine, it’s especially critical that it’s well protected. Private/public key authentication allows you to login into your machine without providing a password.

Here is a very basic overview.

You first create a key pair : Private and Public key. The private key allows you to apply your “signature” to content. The only way to verify that signature is to have your Public key. You then transfer that public key file unto your server. When you try to connect to the server using your Private key, the server will try to match the unique signature using the Public key you had previously transferred. That entire operation completely removes the need to transfer a password. It’s almost much more complex to brute-force.

That said, it’s up to you to protect your private key. I recommend adding a passphrare during the key generation. Without that extra step, both the public and private key would both be written in clear-text.


China is knocking…

Jun 16 02:12:30 li362-86 sshd[2234]: Failed password for root from 116.10.191.227 port 32848 ssh2 Jun 16 02:12:31 li362-86 sshd[2232]: Failed password for root from 116.10.191.227 port 29883 ssh2 Jun 16 02:12:32 li362-86 sshd[2234]: Failed password for root from 116.10.191.227 port 32848 ssh2 Jun 16 02:12:34 li362-86 sshd[2232]: Failed password for root from 116.10.191.227 port 29883 ssh2 Jun 16 02:12:34 li362-86 sshd[2234]: Failed password for root from 116.10.191.227 port 32848 ssh2 Jun 16 02:12:36 li362-86 sshd[2232]: Failed password for root from 116.10.191.227 port 29883 ssh2


Prerequisites : This guide was performed on a CentOS 6.5 machine with almost no extra packages. Most, if not all commands, should be the same on Debian based OS. That said, be aware that you will have to substitute your own folder and files path. Any variable you need to replace with your own will be marked with the “$” sign.

Folder has to be owned by the appropriate user attempting to log in.

Tagged , , , , , ,

CentOS 6.5 – Nfsen and Nfdump

CentOS 6.5 – Nfsen and Nfdump