Replacing 90s C Linux Utilities With Python
Welcome to Infrastructure Week July 2018! New articles and tools every day this week.
- day 1: lematt
- day 2: netmatt
- day 3: email 2018
- day 4: web 2018
- day 5: local nets
Everybody knows the netstat
tool, but do you know these fun facts too?
netstat
is part of a package callednet-tools
which includes:arp
hostname
ifconfig
ipmaddr
iptunnel
mii-tool
nameif
netstat
plipconfig
rarp
route
slattach
statistics
- what the heck is a
plipconfig
? Well, it optimizes the performance of your parallel port — I sure am glad my 2018 server with 288 cores and 8 TB RAM hasplipconfig
.- double fun fact: stackoverflow has zero questions about
plipconfig
— it must be a very intuitive and easy to use utility!
- double fun fact: stackoverflow has zero questions about
- what the heck is a
- the
net-tools
codebase was written in the mid 90s- meaning: pre-C99, probably by people who still didn’t even trust or believe in C89 yet.
- 25 years later, the entire codebase still looks like dirty old C
- plus it’s included by default on millions of machines
- and, surprise!, the entire project ended up mostly abandoned
- people are still trying to correct “90s dirty C” idioms in the code to this day
- maint of the package is now seemingly just ad-hoc by OS package maintainers whenever they find a problem or modern Linux incompatibility
TOC:
- What If We Replaced 90s C
netstat
With Python? - it’s almost code time
- it’s code time here come the coders
- screw it, we’re writing a linux kernel module
- C Thy Shame
and, as always, you can ignore all the hard work I put into this write up and just jump right to the code.
What If We Replaced 90s C netstat
With Python?
We’re going to focus on one tool in the net-tools
package: netstat
.
It’s full of poorly formatted code you’d be (hopefully) fired for if you wrote today, but we’ll cover that towards the end so people don’t get scared or scarred up front.
netstat -nape
netstat -p
is one of its most useful features: it shows you which pid and process name is listening on a port.
Example: netstat -nape |grep LISTEN
tcp 0 0 127.0.0.1:40000 0.0.0.0:* LISTEN 1000 1339864 8445/beam.smp
tcp 0 0 127.0.0.1:40001 0.0.0.0:* LISTEN 1000 21040 987/beam.smp
tcp 0 0 127.0.0.1:40002 0.0.0.0:* LISTEN 1000 20151 1029/beam.smp
tcp 0 0 127.0.0.1:7780 0.0.0.0:* LISTEN 1000 1348095 8445/beam.smp
tcp 0 0 127.0.0.1:7781 0.0.0.0:* LISTEN 1000 18276 987/beam.smp
tcp 0 0 158.69.158.251:80 0.0.0.0:* LISTEN 0 20606 620/nginx: master p
tcp 0 0 127.0.0.1:4369 0.0.0.0:* LISTEN 1000 21001 968/epmd
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 101 18522 493/systemd-resolve
tcp 0 0 192.168.122.10:22 0.0.0.0:* LISTEN 0 17848 579/sshd
tcp 0 0 158.69.158.251:443 0.0.0.0:* LISTEN 0 20594 620/nginx: master p
It gives us all listening IP:Port combinations along with their pid and process names (scroll to the right where the style falls off).
Unfortunately, we see some limitations:
- look at the nginx process name: it’s
nginx: master p
netstat
has a fixed-length 20 character buffer for process names. Thanks, 90s!- also, since the output is so short, you don’t get full paths.
- any process by any user in any directory could call itself “sshd” and you wouldn’t notice the difference based on the extremely truncated output
netstat
provides.
- any process by any user in any directory could call itself “sshd” and you wouldn’t notice the difference based on the extremely truncated output
- the output is wide. really wide. not very terminal friendly.
- the output doesn’t appear to be ordered by anything useful? Not by pid, not by IP, not by port.
- oh, and root.
netstat
must be run asroot
to generate the IP:Port to pid/name mappings. That’s not cool.
Plus, the output has six columns we don’t care about!
My first attempt at making the output more useful was: netstat --numeric-hosts --listening --program --tcp --inet --inet6 |awk '{if (NR > 2) {printf "%-4s %-20s served by %-20s\n", $1, $4, $7} }' | sort -k 5,5 -n
:
tcp 127.0.0.53:domain served by 490/systemd-resolve
tcp 0.0.0.0:imaps served by 737/dovecot
tcp 127.0.0.1:imap2 served by 737/dovecot
tcp6 ::1:imap2 served by 737/dovecot
tcp6 :::imaps served by 737/dovecot
tcp 127.0.0.1:smtp served by 904/master
tcp 127.0.0.1:submission served by 904/master
tcp 158.69.158.248:smtp served by 904/master
tcp 158.69.158.2:submission served by 904/master
tcp 127.0.0.1:11332 served by 7989/rspamd:
tcp 127.0.0.1:11334 served by 7989/rspamd:
tcp6 ::1:11332 served by 7989/rspamd:
tcp 192.168.122.8:ssh served by 31018/sshd
A little better! We are now sorted by pid and the bad columns are gone, but the process names are still truncated and netstat
must still be run as root
to generate them at all.
Still not good enough for our needs though.
It’s [Almost] Code Time
Replacing netstat -p
requires figuring out how netstat
matches IP:Ports to pids and why it requires root to show the mapping.
A quick look through the code tells us:
netstat
reads the pid mappings from/proc/[pid]/fd/*
, but each of those directories requiresroot
permission to enter (unless you own thepid
yourself).- why? it’s a security issue to let anybody directly access the open FDs/inodes of any random process
- but why must those directories be consulted?
- Linux only exposes which pids are using which inodes as a
/proc/
symlink in those directories. There’s no other way to discover the mappings. - Those symlinks look like this
ls -latrh /proc/*/fd/*
- Linux only exposes which pids are using which inodes as a
l-wx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/3 -> /dev/kmsg lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/2 -> /dev/null lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/1 -> /dev/null lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/0 -> /dev/null lr-x------ 1 root root 64 Jul 11 18:33 /proc/1/fd/18 -> /run/utmp lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/39 -> socket:[14648] lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/38 -> socket:[878100] lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/37 -> anon_inode:bpf-prog lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/32 -> anon_inode:bpf-prog lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/31 -> anon_inode:bpf-prog lrwx------ 1 root root 64 Jul 11 18:33 /proc/1/fd/30 -> anon_inode:bpf-map lrwx------ 1 root root 64 Jul 11 18:33 /proc/408/fd/9 -> /dev/kmsg lrwx------ 1 root root 64 Jul 11 18:33 /proc/408/fd/8 -> anon_inode:[eventpoll] l-wx------ 1 root root 64 Jul 11 18:33 /proc/408/fd/7 -> /dev/kmsg lrwx------ 1 root root 64 Jul 11 18:33 /proc/408/fd/6 -> socket:[14675] lrwx------ 1 root root 64 Jul 11 18:33 /proc/408/fd/5 -> socket:[14681] lrwx------ 1 root root 64 Jul 11 18:33 /proc/408/fd/4 -> socket:[14679] lrwx------ 1 root root 64 Jul 11 18:33 /proc/408/fd/36 -> /var/log/journal/aa7c9b37491043cca93eed3d1d242ed6/user-1000.journal
Each of those is a symlink where the link name itself tells you the actual inode used by the process.
What use is an inode?
Well, the inodes of each listening IP:Port socket are freely available to any user in /proc/net/{tcp,udp}{,6}
. Here’s a sample of /proc/net/tcp
:
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 00000000:03E1 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 21346 1 ffff9d78b442d800 100 0 0 10 0
1: 0100007F:2C44 00000000:0000 0A 00000000:00000000 00:00000000 00000000 110 0 161280 1 ffff9d78b9d63000 100 0 0 10 0
2: 0100007F:2C46 00000000:0000 0A 00000000:00000000 00:00000000 00000000 110 0 161286 1 ffff9d78b9d62800 100 0 0 10 0
Don’t you like how they stopped naming columns towards the end? What are those extra 7 fields? Don’t you worry your pretty little head about it. Just let those fields be fields and you be you.
Those IP addresses do look a bit off.
Let’s ask Erlang to convert those hex IP addresses back into individual bytes for us:
And, if you hadn’t guessed before, hex 0100007F
is little endian 127.0.0.1
.
If we tell Erlang about the byte orientation up front, it can fix it for us:
The port numbers are easy to decipher too (oddly, Linux provides the port numbers in network byte order while the IP addresses aren’t. thanks, Linux!):
or if you are tired of Erlang:
or if you want some Python flavor:
whew, so now we know:
- We can get every IP:Port active on the system, both incoming and outgoing, from:
/proc/net/tcp
/proc/net/udp
/proc/net/tcp6
/proc/net/udp6
- We can pick listening addresses (using the
st
(ate) column) and save theinode
column. - We can read every open fd in the system by walking
/proc/[pid]/fd/*
- If a symlink points to a
socket:[INODE]
, then- parse the symlink, extract the inode number, compare to our previously retrieved inode list from
/proc/net/{tcp,udp}{,6}
.
- parse the symlink, extract the inode number, compare to our previously retrieved inode list from
- If a symlink points to a
- then we can read
/proc/[pid]/cmdline
for eachpid
we matched to get the full command line (instead of being limited to a 20 character buffer like 90s netstat C code). - Finally, we can do any remaining formatting/sorting/filtering for presentation.
We can do all those steps in Python easily, right? It’s walking some directories, matching some files against other files, then printing the output we actually want.
Let’s do this.
Now It’s Python Coding Time
The netstat
source uses C APIs for directory walking by:
opendir(3)
of/proc
to walk the pid directories- so,
readdir(3)
for each directory entry - if
readdir
returns a pid directory, run anotheropendir()
to walk thepid/fd
directory.- now
readdir()
again to walk the fd entries- then call
readlink(3)
trying to findsocket:[INODE]
entries.
- then call
- now
- so,
It’s a lot of system calls for file operations even though it’s running through procfs.
We could copy the netstat
algorithm exactly using Python’s os.walk()
API.
So, that’s what I did the first time through. I used os.walk()
and it took 500ms to generate results (30x slower than old netstat
, not cool).
But, this is the future and we have better APIs: if we replace 20 lines of looping os.walk()
code with one line of glob.glob("/proc/*/fd/*")
, our runtime drops from 500ms to 70ms.
Read Dem Files
Even though we can get a nice quick file listing with glob
ohmyglob! we still have to os.readlink()
on every filename returned by the glob.
create map of inode->list of pids
Capturing every processes socket inode->pid mapping becomes:
# glob glob glob it all
allFDs = glob.iglob("/proc/*/fd/*")
inodePidMap = collections.defaultdict(list)
for fd in allFDs:
# split because path looks like: /proc/[pid]/fd/[number]
_, _, pid, _, _ = fd.split('/')
try:
target = os.readlink(fd)
except FileNotFoundError:
# file vanished, can't do anything else
continue
# "target" is now something like:
# - socket:[INODE]
# - pipe:[INODE]
# - /dev/pts/N
# - or an actual full file paths
if target.startswith("socket"):
ostype, inode = target.split(':')
# strip brackets from fd string (it looks like: [fd])
inode = int(inode[1:-1])
inodePidMap[inode].append(int(pid))
Note at the end how we append the pid to our map of inodes->[pid].
netstat -p
doesn’t have the ability to show us every process listening on a socket, but with forking servers and perhaps even REUSEPORT
, multiple processes can listen on the same socket, but you’d never realize that from reading netstat -p
output.
We’re already better than netstat
— we can report the truth of our system instead of having our output lie to us because unmaintained C code from 1993 can’t handle the modern world.
look up the command line for each pid
Now, with our list of pids, we can look up each command line:
for inode in inodes:
if inode in inodePidMap:
for pid in inodePidMap[inode]:
try:
with open(f"/proc/{pid}/cmdline", 'r') as cmd:
# /proc command line arguments are delimited by
# null bytes, so undo that here...
cmdline = cmd.read().split('\0')
inodes[inode].append((pid, cmdline))
except BaseException:
# files can vanish on us at any time (and that's okay!)
pass
i… i… inodes!
Where did the inodes
dict come from? We didn’t populate that yet!
inodes
was the result of parsing /proc/net/{tcp,udp}{,6}
, which is as simple as:
def populateInodes(name):
""" Process IPv4 and IPv6 versions of listeners based on ``name``.
``name`` is either 'udp' or 'tcp' so we parse, for each ``name``:
- /proc/net/[name]
- /proc/net/[name]6
As in:
- /proc/net/tcp
- /proc/net/tcp6
- /proc/net/udp
- /proc/net/udp6
"""
isUDP = name == "udp"
for ver in ["", "6"]:
with open(f"/proc/net/{name}{ver}", 'r') as proto:
proto = proto.read().splitlines()
proto = proto[1:] # drop header row
for cxn in proto:
cxn = cxn.split()
# /proc/net/udp{,6} uses different constants for LISTENING
if isUDP:
# These constants are based on enum offsets inside
# the Linux kernel itself. They aren't likely to ever
# change since they are hardcoded in utilities.
isListening = cxn[3] == "07"
else:
isListening = cxn[3] == "0A"
# Right now this is a single-purpose tool so if process is
# not listening, we avoid further processing of this row.
if not isListening:
continue
ip, port = cxn[1].split(':')
if ver:
ip = ipv6(ip)
else:
ip = ipv4(ip)
port = int(port, 16)
inode = cxn[9]
# We just use a list here because creating a new sub-dict
# for each entry was noticeably slower than just indexing
# into lists.
inodes[int(inode)] = [ip, port, f"{name}{ver}"]
populateInodes("tcp")
populateInodes("udp")
Simple enough? We also use functions ipv4()
and ipv6()
to parse the hex IPs from /proc/net
to readable formats:
def ipv6(addr):
""" Convert /proc IPv6 hex address into standard IPv6 notation. """
# turn ASCII hex address into binary
addr = codecs.decode(addr, "hex")
# unpack into 4 32-bit integers in big endian / network byte order
addr = struct.unpack('!LLLL', addr)
# re-pack as 4 32-bit integers in system native byte order
addr = struct.pack('@IIII', *addr)
# now we can use standard network APIs to format the address
addr = socket.inet_ntop(socket.AF_INET6, addr)
return addr
def ipv4(addr):
""" Convert /proc IPv4 hex address into standard IPv4 notation. """
# Instead of codecs.decode(), we can just convert a 4 byte hex
# string to an integer directly using python radix conversion.
# Basically, int(addr, 16) EQUALS:
# aOrig = addr
# addr = codecs.decode(addr, "hex")
# addr = struct.unpack(">L", addr)
# assert(addr == (int(aOrig, 16),))
addr = int(addr, 16)
# system native byte order, 4-byte integer
addr = struct.pack("=L", addr)
addr = socket.inet_ntop(socket.AF_INET, addr)
return addr
And we’re done! We now have a dict called inodes
containing every listening IP:Port on our system.
All that’s remaining is to draw the rest of the fscking owl format it how we want, which gives us:
Proto Listening PID Process udp 192.168.122.10:bootpc 441 /lib/systemd/systemd-networkd tcp 127.0.0.53:domain 493 /lib/systemd/systemd-resolved udp 127.0.0.53:domain 493 /lib/systemd/systemd-resolved tcp 192.168.122.10:ssh 579 /usr/sbin/sshd -D udp 127.0.0.1:323 581 /usr/sbin/chronyd udp6 ::1:323 581 /usr/sbin/chronyd tcp 158.69.158.251:http 620 nginx: master process /usr/sbin/nginx -g daemon on; 2893 nginx: worker process 2894 nginx: worker process 2895 nginx: worker process 2896 nginx: worker process tcp 158.69.158.251:https 620 nginx: master process /usr/sbin/nginx -g daemon on; 2893 nginx: worker process 2894 nginx: worker process 2895 nginx: worker process 2896 nginx: worker process tcp 0.0.0.0:smtp 908 /usr/lib/postfix/sbin/master -w 11580 smtpd -n smtp -t inet -u -c -o stress= -s 2 tcp6 :::smtp 908 /usr/lib/postfix/sbin/master -w 11580 smtpd -n smtp -t inet -u -c -o stress= -s 2 tcp 127.0.0.1:epmd 968 /opt/otp/17.5/erts-6.4/bin/epmd -daemon tcp 127.0.0.1:7781 987 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/ tcp 127.0.0.1:40001 987 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/ tcp 127.0.0.1:8888 1029 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/ tcp 127.0.0.1:40002 1029 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/ tcp 127.0.0.1:7780 8445 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/ tcp 127.0.0.1:40000 8445 /opt/otp/17.5/erts-6.4/bin/beam.smp -- -root /opt/
oooooh so pretty! no unnecessary columns, sorted by primary pid number, reports multiple listeners controlling one socket…
Plus: colors! We’re using blue for private/local IPs and red for other. Double Plus: terminal-width aware printing so we never wrap lines!
But, sadly, because of Linux design choices, the only place we can discover the pid mappings is by running as root
to read /proc/[pid]/fd/*
entries. How can we get around such a restriction so everybody can run the netstat of the people? seizethemeansofnetstatting!
Let’s Make A Module
The problem with finding system-wide inode to pid mappings is simple: Linux never created an interface to discover them without opening O(N) directories and reading symlink targets of O(k) files. Oh, and those directories can only be opened by root
or the process owner. Whoops.
But, just because Linux never created such an interface doesn’t mean we can’t create one for ourselves!
Let’s write a simple Linux kernel function to list every inode belonging to every pid:
static int generate_mapping(struct seq_file *m, void *data) {
struct task_struct *task;
for_each_process(task) {
struct files_struct *files;
struct fdtable *fdt;
uint32_t i;
seq_printf(m, "%d %s ", task->pid, task->comm);
files = task->files;
if (!files) {
return;
}
rcu_read_lock();
fdt = files_fdtable(files);
for (i = 0; i < fdt->max_fds; i++) {
const struct file *file;
file = fdt->fd[i];
if (file) {
seq_printf(m, "%zu ", file->f_inode->i_ino);
}
}
rcu_read_unlock();
seq_printf(m, "\n");
}
return 0;
}
The code prints a line of {pid} {short process name} {inodes*}
for each task/pid on the system.
We run our function by turning it into a Linux kernel module using both the simplified seq_file
and procfs
APIs to:
- create an entry in
/proc
- tell Linux how to run our function when anybody reads
/proc/pid_inode_map
static int pid_inode_map_open(struct inode *inode, struct file *file) {
return single_open(file, generate_mapping, NULL);
}
static const struct file_operations ops = {.owner = THIS_MODULE,
.open = pid_inode_map_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release};
int __init pid_inode_map_init(void) {
proc_create("pid_inode_map", 0, NULL, &ops);
return 0;
}
Load Module, See Results!
After loading the module, cat /proc/pid_inode_map
shows lines like:
620 nginx 6 6 13766689 105768 105769 105770 105771 105772 105773 105774 105775 105776 105777 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767
2893 nginx 6 6 13766689 105770 105769 9608 9608 105772 105774 105776 107702 118090 127195 158609 250505 154126 363096 152357 154486 163176 157017 159757 8923366 1038845 398376 311746 401810 259008 329551 772944 8917890 8919491 1345637 269898 190345 275581 775960 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 346112 8916535 573709 8918273 514243 517896 8922757 1057423 417516 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767 508640 223050 360995 262724 8917891 229720 780803 8922165 8920851 8922758 8921905 868648 338897 702666 397459 399205 8922166 1153539 756070 528516 263822 8918033 655695 269200 263828 8916342 8916287 8917221 8922881 8916288 8919967 8923185 8916536 8921906 8920105 8919231 8920106 8923186 8922882 8919617 8919232 8917009 8919618 8916343 8917010 8919968 8920852 8919492
2894 nginx 6 6 13766689 105768 105772 105774 105771 9608 9608 105776 1503116 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767
2895 nginx 6 6 13766689 263822 105768 105774 105770 105776 248011 105773 9608 9608 335050 549288 752603 1330348 266457 1096298 1467136 266451 266441 266449 266431 266433 1857020 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767
2896 nginx 6 6 13766689 105768 105776 105770 105772 1432505 105775 9608 9608 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767
2897 nginx 6 6 13766689 105768 105770 105772 105774 105777 9608 9608 20610 20610 20611 20611 20612 20612 20613 20613 13763653 13766689 13766685 13766686 13766687 13766688 13766692 13766693 13766690 13766691 13766694 13766695 13766696 13766697 13766698 13766699 13766700 13766701 13766702 13766703 13766704 13766705 13766706 13766707 13763191 13763192 13766708 13766709 13766710 13766711 13766712 13763651 13766713 13766714 105764 105764 105765 105765 105766 105766 105767 105767
With our custom mapping of pids to their inodes, we can adjust the netstat
replacement to take advantage of one-stop-shopping-mapping.
Replacing glob
with Custom Proc Parsing
Let’s read /proc/pid_inode_map
once instead of parsing O(N*k) symlinks:
def generateInodePidMapFromProcCustom():
""" Read /proc/pid_inode_map to populate inodePidMap """
inodePidMap = collections.defaultdict(list)
with open("/proc/pid_inode_map", 'r') as pim:
for line in pim:
parts = line.split()
pid = parts[0]
name = parts[1] # unused, we lookup the full cmdline later
pimInodes = set(parts[2:])
for inode in pimInodes:
inodePidMap[int(inode)].append(int(pid))
return inodePidMap
We save thousands of system operations by generating one file any user can read instead of needing root
to go through every fd of every process.
Our approach of parsing a pre-generated file at /proc/pid_inode_map
is 40% faster than iterating all the pids and fd symlinks every time we want a network status.
Hey Linux, give us a built-in pid to inode mapping by default!
Code for all the Python scripts plus the Linux kernel module is at mattsta/netmatt.
C Thy Shame
Jumping back a bit, let’s look at some netstat.c
code.
Code has not been modified to protect the guilty. It actually is formatted like this.
This is in netstat.c
still shipping in your Linux net-tools
package in 2018:
if (prg_cache_loaded || !flag_prg) return;
prg_cache_loaded = 1;
cmdlbuf[sizeof(cmdlbuf) - 1] = '\0';
if (!(dirproc=opendir(PATH_PROC))) goto fail;
while (errno = 0, direproc = readdir(dirproc)) {
for (cs = direproc->d_name; *cs; cs++)
if (!isdigit(*cs))
break;
if (*cs)
continue;
procfdlen = snprintf(line,sizeof(line),PATH_PROC_X_FD,direproc->d_name);
if (procfdlen <= 0 || procfdlen >= sizeof(line) - 5)
continue;
errno = 0;
dirfd = opendir(line);
if (! dirfd) {
if (errno == EACCES)
eacces = 1;
continue;
}
line[procfdlen] = '/';
cmdlp = NULL;
Did you notice the excerpt has an unterminated while
loop? Do you see it?
If you want to follow along at home, you can get the source with apt-get source net-tools
.
What if I spend 0.03 seconds to run it through modern automated formatting tools?
if (prg_cache_loaded || !flag_prg) {
return;
}
prg_cache_loaded = 1;
cmdlbuf[sizeof(cmdlbuf) - 1] = '\0';
if (!(dirproc = opendir(PATH_PROC))) {
goto fail;
}
while (errno = 0, direproc = readdir(dirproc)) {
for (cs = direproc->d_name; *cs; cs++) {
if (!isdigit(*cs)) {
break;
}
}
if (*cs) {
continue;
}
procfdlen =
snprintf(line, sizeof(line), PATH_PROC_X_FD, direproc->d_name);
if (procfdlen <= 0 || procfdlen >= sizeof(line) - 5) {
continue;
}
errno = 0;
dirfd = opendir(line);
if (!dirfd) {
if (errno == EACCES) {
eacces = 1;
}
continue;
}
line[procfdlen] = '/';
cmdlp = NULL;
90s C code has plenty of weird properties, but the strangest is an absolute refusal to use proper indentation combined with a massive lack of visual whitespace.
Though, even in 2018 backwards people still argue “brevity” is the highest form of coding. Never write in 4 lines what you can technically manipulate your compiler into accepting as 1 line, even if you just remove all the whitespace and brackets and indentation. We call these people CDs (C Dolts) and they should be monitored carefully to minimize ongoing damage to the time stream.
Making 90s C code is easy:
- cram everything close together
- align most everything to the left with no indentation
- never — never! — use brackets if your
if
,for
, orwhile
only has one result statement- as a bonus, lie using indentation about what your
if
does, like the inexcusably badif (lnamelen == -1)
statement below.- look, it has a ‘continue’ but then everything below is still indented like it applies to the if statement! Gotta love 25 year old abandoned code running on millions of machines around the world.
- as a bonus, lie using indentation about what your
Check out this source excerpt still shipping in 2018 too:
while ((direfd = readdir(dirfd))) {
/* Skip . and .. */
if (!isdigit(direfd->d_name[0]))
continue;
if (procfdlen + 1 + strlen(direfd->d_name) + 1 > sizeof(line))
continue;
memcpy(line + procfdlen - PATH_FD_SUFFl, PATH_FD_SUFF "/",
PATH_FD_SUFFl + 1);
safe_strncpy(line + procfdlen + 1, direfd->d_name,
sizeof(line) - procfdlen - 1);
lnamelen = readlink(line, lname, sizeof(lname) - 1);
if (lnamelen == -1)
continue;
lname[lnamelen] = '\0'; /*make it a null-terminated string*/
if (extract_type_1_socket_inode(lname, &inode) < 0)
if (extract_type_2_socket_inode(lname, &inode) < 0)
continue;
if (!cmdlp) {
if (procfdlen - PATH_FD_SUFFl + PATH_CMDLINEl >=
sizeof(line) - 5)
continue;
safe_strncpy(line + procfdlen - PATH_FD_SUFFl, PATH_CMDLINE,
sizeof(line) - procfdlen + PATH_FD_SUFFl);
fd = open(line, O_RDONLY);
if (fd < 0)
continue;
If your eyes haven’t exploded from code stress yet, count the unterminated flow control statements. Do you see?
Let’s clean this up again using 0.03 seconds of automated tooling:
while ((direfd = readdir(dirfd))) {
/* Skip . and .. */
if (!isdigit(direfd->d_name[0])) {
continue;
}
if (procfdlen + 1 + strlen(direfd->d_name) + 1 > sizeof(line)) {
continue;
}
memcpy(line + procfdlen - PATH_FD_SUFFl, PATH_FD_SUFF "/",
PATH_FD_SUFFl + 1);
safe_strncpy(line + procfdlen + 1, direfd->d_name,
sizeof(line) - procfdlen - 1);
lnamelen = readlink(line, lname, sizeof(lname) - 1);
if (lnamelen == -1) {
continue;
}
lname[lnamelen] = '\0'; /*make it a null-terminated string*/
if (extract_type_1_socket_inode(lname, &inode) < 0) {
if (extract_type_2_socket_inode(lname, &inode) < 0) {
continue;
}
}
if (!cmdlp) {
if (procfdlen - PATH_FD_SUFFl + PATH_CMDLINEl >=
sizeof(line) - 5) {
continue;
}
safe_strncpy(line + procfdlen - PATH_FD_SUFFl, PATH_CMDLINE,
sizeof(line) - procfdlen + PATH_FD_SUFFl);
fd = open(line, O_RDONLY);
if (fd < 0) {
continue;
}
In the original code section, did you notice if (!cmdlp) {
was unterminated? No, you didn’t notice, because they refused to use indentation in 1993 and nobody has fixed it in the subsequent 25 years.
90s C is basically the pinnacle of the Write Once, Read Never Again coding movement and must be ridiculed at all costs. Riddikulus!
Conclusion
What did we learn today?
netstat
is part ofnet-tools
net-tools
is a mostly abandoned set of Linux utilities from the mid 90s- Linux doesn’t let non-
root
users discoverpid
to[inodes]
metadata netstat
actually under-reports which pids own which socketsnetstat
only lists one pid even though sockets can be owned by multiple pids
- But we can write a Linux kernel module to generate the mapping anyway! seizethemeansofnetstatting!
- The 90s Linux utility C code is awful and needs to be either adopted and completely re-formatted, re-reviewed, and brought up to modern standards, or outright abandoned.
- We can write much safer system utilities in Python
- they are fast enough
- they are safe enough
- they are readable enough
- and doggone it, people like me.
Stay tuned for more Infrastructure Week July 2018! New articles and tools every day this week.
- day 1: lematt
- day 2: netmatt
- day 3: email 2018
- day 4: web 2018
- day 5: local nets