Replacing netstat's 90s C Code With Modern Python

Replacing 90s C Linux Utilities With Python

Welcome to Infrastructure Week July 2018! New articles and tools every day this week.

Everybody knows the netstat tool, but do you know these fun facts too?

  • netstat is part of a package called net-tools which includes: arp hostname ifconfig ipmaddr iptunnel mii-tool nameif netstat plipconfig rarp route slattach statistics
    • what the heck is a plipconfig? Well, it optimizes the performance of your parallel port — I sure am glad my 2018 server with 288 cores and 8 TB RAM has plipconfig.
      • double fun fact: stackoverflow has zero questions about plipconfig — it must be a very intuitive and easy to use utility!
  • the net-tools codebase was written in the mid 90s
    • meaning: pre-C99, probably by people who still didn’t even trust or believe in C89 yet.
  • 25 years later, the entire codebase still looks like dirty old C
    • plus it’s included by default on millions of machines
  • and, surprise!, the entire project ended up mostly abandoned
  • people are still trying to correct “90s dirty C” idioms in the code to this day
  • maint of the package is now seemingly just ad-hoc by OS package maintainers whenever they find a problem or modern Linux incompatibility

TOC:

and, as always, you can ignore all the hard work I put into this write up and just jump right to the code.

What If We Replaced 90s C netstat With Python?

We’re going to focus on one tool in the net-tools package: netstat.

It’s full of poorly formatted code you’d be (hopefully) fired for if you wrote today, but we’ll cover that towards the end so people don’t get scared or scarred up front.

netstat -nape

netstat -p is one of its most useful features: it shows you which pid and process name is listening on a port.

Example: netstat -nape |grep LISTEN

It gives us all listening IP:Port combinations along with their pid and process names (scroll to the right where the style falls off).

Unfortunately, we see some limitations:

  • look at the nginx process name: it’s nginx: master p
    • netstat has a fixed-length 20 character buffer for process names. Thanks, 90s!
    • also, since the output is so short, you don’t get full paths.
      • any process by any user in any directory could call itself “sshd” and you wouldn’t notice the difference based on the extremely truncated output netstat provides.
  • the output is wide. really wide. not very terminal friendly.
  • the output doesn’t appear to be ordered by anything useful? Not by pid, not by IP, not by port.
  • oh, and root.
    • netstat must be run as root to generate the IP:Port to pid/name mappings. That’s not cool.

Plus, the output has six columns we don’t care about!

My first attempt at making the output more useful was: netstat --numeric-hosts --listening --program --tcp --inet --inet6 |awk '{if (NR > 2) {printf "%-4s %-20s served by %-20s\n", $1, $4, $7} }' | sort -k 5,5 -n:

A little better! We are now sorted by pid and the bad columns are gone, but the process names are still truncated and netstat must still be run as root to generate them at all.

Still not good enough for our needs though.

It’s [Almost] Code Time

Replacing netstat -p requires figuring out how netstat matches IP:Ports to pids and why it requires root to show the mapping.

A quick look through the code tells us:

  • netstat reads the pid mappings from /proc/[pid]/fd/*, but each of those directories requires root permission to enter (unless you own the pid yourself).
    • why? it’s a security issue to let anybody directly access the open FDs/inodes of any random process
    • but why must those directories be consulted?
      • Linux only exposes which pids are using which inodes as a /proc/ symlink in those directories. There’s no other way to discover the mappings.
      • Those symlinks look like this ls -latrh /proc/*/fd/*
l-wx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/3 -> /dev/kmsg
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/2 -> /dev/null
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/1 -> /dev/null
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/0 -> /dev/null
lr-x------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/18 -> /run/utmp
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/39 -> socket:[14648]
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/38 -> socket:[878100]
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/37 -> anon_inode:bpf-prog
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/32 -> anon_inode:bpf-prog
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/31 -> anon_inode:bpf-prog
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/1/fd/30 -> anon_inode:bpf-map
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/408/fd/9 -> /dev/kmsg
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/408/fd/8 -> anon_inode:[eventpoll]
l-wx------ 1 root            root            64 Jul 11 18:33 /proc/408/fd/7 -> /dev/kmsg
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/408/fd/6 -> socket:[14675]
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/408/fd/5 -> socket:[14681]
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/408/fd/4 -> socket:[14679]
lrwx------ 1 root            root            64 Jul 11 18:33 /proc/408/fd/36 -> /var/log/journal/aa7c9b37491043cca93eed3d1d242ed6/user-1000.journal

Each of those is a symlink where the link name itself tells you the actual inode used by the process.

What use is an inode?

Well, the inodes of each listening IP:Port socket are freely available to any user in /proc/net/{tcp,udp}{,6}. Here’s a sample of /proc/net/tcp:

Don’t you like how they stopped naming columns towards the end? What are those extra 7 fields? Don’t you worry your pretty little head about it. Just let those fields be fields and you be you.

Those IP addresses do look a bit off.

Let’s ask Erlang to convert those hex IP addresses back into individual bytes for us:

And, if you hadn’t guessed before, hex 0100007F is little endian 127.0.0.1.

If we tell Erlang about the byte orientation up front, it can fix it for us:

The port numbers are easy to decipher too (oddly, Linux provides the port numbers in network byte order while the IP addresses aren’t. thanks, Linux!):

or if you are tired of Erlang:

or if you want some Python flavor:

whew, so now we know:

  • We can get every IP:Port active on the system, both incoming and outgoing, from:
    • /proc/net/tcp
    • /proc/net/udp
    • /proc/net/tcp6
    • /proc/net/udp6
  • We can pick listening addresses (using the st(ate) column) and save the inode column.
  • We can read every open fd in the system by walking /proc/[pid]/fd/*
    • If a symlink points to a socket:[INODE], then
      • parse the symlink, extract the inode number, compare to our previously retrieved inode list from /proc/net/{tcp,udp}{,6}.
  • then we can read /proc/[pid]/cmdline for each pid we matched to get the full command line (instead of being limited to a 20 character buffer like 90s netstat C code).
  • Finally, we can do any remaining formatting/sorting/filtering for presentation.

We can do all those steps in Python easily, right? It’s walking some directories, matching some files against other files, then printing the output we actually want.

Let’s do this.


Now It’s Python Coding Time

The netstat source uses C APIs for directory walking by:

  • opendir(3) of /proc to walk the pid directories
    • so, readdir(3) for each directory entry
    • if readdir returns a pid directory, run another opendir() to walk the pid/fd directory.
      • now readdir() again to walk the fd entries
        • then call readlink(3) trying to find socket:[INODE] entries.

It’s a lot of system calls for file operations even though it’s running through procfs.

We could copy the netstat algorithm exactly using Python’s os.walk() API.

So, that’s what I did the first time through. I used os.walk() and it took 500ms to generate results (30x slower than old netstat, not cool).

But, this is the future and we have better APIs: if we replace 20 lines of looping os.walk() code with one line of glob.glob("/proc/*/fd/*"), our runtime drops from 500ms to 70ms.

Read Dem Files

Even though we can get a nice quick file listing with globohmyglob! we still have to os.readlink() on every filename returned by the glob.

create map of inode->list of pids

Capturing every processes socket inode->pid mapping becomes:

Note at the end how we append the pid to our map of inodes->[pid].

netstat -p doesn’t have the ability to show us every process listening on a socket, but with forking servers and perhaps even REUSEPORT, multiple processes can listen on the same socket, but you’d never realize that from reading netstat -p output.

We’re already better than netstat — we can report the truth of our system instead of having our output lie to us because unmaintained C code from 1993 can’t handle the modern world.


look up the command line for each pid

Now, with our list of pids, we can look up each command line:


i… i… inodes!

Where did the inodes dict come from? We didn’t populate that yet!

inodes was the result of parsing /proc/net/{tcp,udp}{,6}, which is as simple as:


Simple enough? We also use functions ipv4() and ipv6() to parse the hex IPs from /proc/net to readable formats:


And we’re done! We now have a dict called inodes containing every listening IP:Port on our system.

All that’s remaining is to draw the rest of the fscking owl format it how we want, which gives us:

Proto         Listening           PID            Process            
udp   192.168.122.10:bootpc       441 /lib/systemd/systemd-networkd 
tcp   127.0.0.53:domain           493 /lib/systemd/systemd-resolved 
udp   127.0.0.53:domain           493 /lib/systemd/systemd-resolved 
tcp   192.168.122.10:ssh          579 /usr/sbin/sshd -D 
udp   127.0.0.1:323               581 /usr/sbin/chronyd 
udp6  ::1:323                     581 /usr/sbin/chronyd 
tcp   158.69.158.251:http         620 nginx: master process /usr/sbin/nginx -g daemon on;
                                 2893 nginx: worker process                            
                                 2894 nginx: worker process                            
                                 2895 nginx: worker process                            
                                 2896 nginx: worker process                            
tcp   158.69.158.251:https        620 nginx: master process /usr/sbin/nginx -g daemon on;
                                 2893 nginx: worker process                            
                                 2894 nginx: worker process                            
                                 2895 nginx: worker process                            
                                 2896 nginx: worker process                            
tcp   0.0.0.0:smtp                908 /usr/lib/postfix/sbin/master -w 
                                11580 smtpd -n smtp -t inet -u -c -o stress= -s 2 
tcp6  :::smtp                     908 /usr/lib/postfix/sbin/master -w 
                                11580 smtpd -n smtp -t inet -u -c -o stress= -s 2 
tcp   127.0.0.1:epmd              968 /opt/otp/17.5/erts-6.4/bin/epmd -daemon 
tcp   127.0.0.1:7781              987 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/
tcp   127.0.0.1:40001             987 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/
tcp   127.0.0.1:8888             1029 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/
tcp   127.0.0.1:40002            1029 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/
tcp   127.0.0.1:7780             8445 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/
tcp   127.0.0.1:40000            8445 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/

oooooh so pretty! no unnecessary columns, sorted by primary pid number, reports multiple listeners controlling one socket…

Plus: colors! We’re using blue for private/local IPs and red for other. Double Plus: terminal-width aware printing so we never wrap lines!

But, sadly, because of Linux design choices, the only place we can discover the pid mappings is by running as root to read /proc/[pid]/fd/* entries. How can we get around such a restriction so everybody can run the netstat of the people? seizethemeansofnetstatting!

Let’s Make A Module

The problem with finding system-wide inode to pid mappings is simple: Linux never created an interface to discover them without opening O(N) directories and reading symlink targets of O(k) files. Oh, and those directories can only be opened by root or the process owner. Whoops.

But, just because Linux never created such an interface doesn’t mean we can’t create one for ourselves!

Let’s write a simple Linux kernel function to list every inode belonging to every pid:

The code prints a line of {pid} {short process name} {inodes*} for each task/pid on the system.


We run our function by turning it into a Linux kernel module using both the simplified seq_file and procfs APIs to:

  • create an entry in /proc
  • tell Linux how to run our function when anybody reads /proc/pid_inode_map

Load Module, See Results!

After loading the module, cat /proc/pid_inode_map shows lines like:

620 nginx 6 6 13766689 105768 105769 105770 105771 105772 105773 105774 105775 105776 105777 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767 
2893 nginx 6 6 13766689 105770 105769 9608 9608 105772 105774 105776 107702 118090 127195 158609 250505 154126 363096 152357 154486 163176 157017 159757 8923366 1038845 398376 311746 401810 259008 329551 772944 8917890 8919491 1345637 269898 190345 275581 775960 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 346112 8916535 573709 8918273 514243 517896 8922757 1057423 417516 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767 508640 223050 360995 262724 8917891 229720 780803 8922165 8920851 8922758 8921905 868648 338897 702666 397459 399205 8922166 1153539 756070 528516 263822 8918033 655695 269200 263828 8916342 8916287 8917221 8922881 8916288 8919967 8923185 8916536 8921906 8920105 8919231 8920106 8923186 8922882 8919617 8919232 8917009 8919618 8916343 8917010 8919968 8920852 8919492 
2894 nginx 6 6 13766689 105768 105772 105774 105771 9608 9608 105776 1503116 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767 
2895 nginx 6 6 13766689 263822 105768 105774 105770 105776 248011 105773 9608 9608 335050 549288 752603 1330348 266457 1096298 1467136 266451 266441 266449 266431 266433 1857020 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767 
2896 nginx 6 6 13766689 105768 105776 105770 105772 1432505 105775 9608 9608 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767 
2897 nginx 6 6 13766689 105768 105770 105772 105774 105777 9608 9608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767 

With our custom mapping of pids to their inodes, we can adjust the netstat replacement to take advantage of one-stop-shopping-mapping.


Replacing glob with Custom Proc Parsing

Let’s read /proc/pid_inode_map once instead of parsing O(N*k) symlinks:

We save thousands of system operations by generating one file any user can read instead of needing root to go through every fd of every process.

Our approach of parsing a pre-generated file at /proc/pid_inode_map is 40% faster than iterating all the pids and fd symlinks every time we want a network status.

Hey Linux, give us a built-in pid to inode mapping by default!

Code for all the Python scripts plus the Linux kernel module is at mattsta/netmatt.


C Thy Shame

Jumping back a bit, let’s look at some netstat.c code.

Code has not been modified to protect the guilty. It actually is formatted like this.

This is in netstat.c still shipping in your Linux net-tools package in 2018:

Did you notice the excerpt has an unterminated while loop? Do you see it?

If you want to follow along at home, you can get the source with apt-get source net-tools.

What if I spend 0.03 seconds to run it through modern automated formatting tools?

90s C code has plenty of weird properties, but the strangest is an absolute refusal to use proper indentation combined with a massive lack of visual whitespace.

Though, even in 2018 backwards people still argue “brevity” is the highest form of coding. Never write in 4 lines what you can technically manipulate your compiler into accepting as 1 line, even if you just remove all the whitespace and brackets and indentation. We call these people CDs (C Dolts) and they should be monitored carefully to minimize ongoing damage to the time stream.

Making 90s C code is easy:

  • cram everything close together
  • align most everything to the left with no indentation
  • never — never! — use brackets if your if, for, or while only has one result statement
    • as a bonus, lie using indentation about what your if does, like the inexcusably bad if (lnamelen == -1) statement below.
      • look, it has a ‘continue’ but then everything below is still indented like it applies to the if statement! Gotta love 25 year old abandoned code running on millions of machines around the world.

Check out this source excerpt still shipping in 2018 too:

If your eyes haven’t exploded from code stress yet, count the unterminated flow control statements. Do you see?

Let’s clean this up again using 0.03 seconds of automated tooling:

In the original code section, did you notice if (!cmdlp) { was unterminated? No, you didn’t notice, because they refused to use indentation in 1993 and nobody has fixed it in the subsequent 25 years.

90s C is basically the pinnacle of the Write Once, Read Never Again coding movement and must be ridiculed at all costs. Riddikulus!

Conclusion

What did we learn today?

  • netstat is part of net-tools
  • net-tools is a mostly abandoned set of Linux utilities from the mid 90s
  • Linux doesn’t let non-root users discover pid to [inodes] metadata
  • netstat actually under-reports which pids own which sockets
    • netstat only lists one pid even though sockets can be owned by multiple pids
  • But we can write a Linux kernel module to generate the mapping anyway! seizethemeansofnetstatting!
  • The 90s Linux utility C code is awful and needs to be either adopted and completely re-formatted, re-reviewed, and brought up to modern standards, or outright abandoned.
  • We can write much safer system utilities in Python
    • they are fast enough
    • they are safe enough
    • they are readable enough
    • and doggone it, people like me.

-Matt@mattsta☁mattsta


Stay tuned for more Infrastructure Week July 2018! New articles and tools every day this week.