I was recently debugging an issue at work relating to VRF where I found that getaddrinfo was not using the VRF interface and therefore DNS would not work if it needed to go over that interface. I found out that in the 4.10 kernel and later, the ip command allows you to run a program with `ip vrf exec <vrfname> <command>` that would presumably tell every socket created as a result of that command to use the specific VRF interface. Unfortunately our platform is currently on 4.9 kernel.
The trick is, when you open a socket(), you need to call something like:
setsockopt(<sock fd>, SOL_SOCKET, SO_BINDTODEVICE, “vrfname”, <vrfname_length>)
Which will send the packet over the vrf. Knowing roughly this must be what the ip vrf exec must be doing, I wondered if I could hack something up to do this myself? My first though was actually to replace getaddrinfo() with myown implementation and load with LD_PRELOAD. I experimented with this but found that it would be too difficult, since I would need to copy a bunch of glibc (where getaddrinfo is) code, and also it would not solve the problem in the general sense, if other programs don’t know about VRF. While debugging with GDB I had the idea of creating a GDB script that could call the above setsockopt on each successful call to socket(). This is what I came up with:
break socket python class MyFinishBreakpoint (gdb.FinishBreakpoint): def stop (self): eax = gdb.parse_and_eval("$eax") if eax > 0: vrfname = "myvrf" gdb.execute("p (int)setsockopt($eax, 1, 25, \"%s\", %d)"%(vrfname, len(vrfname))) return False # don't want to stop end commands silent py MyFinishBreakpoint() cont end run quit
The above GDB script can be saved in a a file, lets call it fix_vrf.gdb. You can then run a command that has an issue like this:
sudo gdb -x fix_vrf.gdb –args ping -c 1 -I <vrfname> google.com
The -I option in ping will send the icmp packet out the vrf correctly, but it will try to resolve google.com out the wrong interface, that is until the script intercepts the socket() call in getaddrinfo and calls setsockopt on it using these steps:
- Set break point for all socket() calls
- use the python capability in gdb to define a new class inherited from FinishBreakpoint that will be called only AFTER socket() function is finished. This is important since we want the resultant socket file descriptor to call setsockopt on later.
- the stop method is defined which will be called when the socket() is done and will extract the socket() return value from the register eax. The result of any syscall will always be in eax as per the linux syscal calling conventions.
- If the result is > 0 (successful) we finally execute setsockopt with the correct options. I used the numeric versions of the #defines SOL_SOCKET, SO_BINDTODEVICE(1 & 25) for simplicity.
- The commands command is a list of commands to run when a breakpoint is encountered. In here I tell it to instantiate my python class, then continue with the program.
- The last two lines start the program running then exit when done.
I ended up having to use the python FinishBreakpoint method instead of the simply using the “finish” command because there is a limitation (or bug) in GDB which will not let you use the continue command after finish in a list of “commands”, while the python version works for some reason.
This seems to work well! I can’t speak for the performance impact, but for the purposes of proving an issue, I think this approach could be used generally to intercept and modify function results for your debugging or hacking needs!