\section{Targeting Production Readyness} \subsection{Debugging Strategies} Like almost every non-trivial program, the ESP kernel module prone to programming errors. A list of the bugs which were found and could be fixed using either of the presented debugging techniques can be found in the appendix of this document on page~\pageref{fixed-bugs}. \subsubsection{Debugging with KGDB} KGDB \cite{kgdb-intro} is a source-level debugger for the Linux kernel. It allows to use the GNU debugger (GDB)\footnote{In order to be able to analyze dynamically loaded kernel modules, a modified version of GDB has to be used.} to debug the Linux kernel as if it was a regular program, including breakpoints, stepping through the kernel code, watching the contents variables and with support for multithreading. Because suspending kernel code execution for analysis causes any user-space applications to be halted as well, it is indispensable to have two machines in order to use KGDB: One as the testing machine, where the kernel to debug is running and another which is used as the development machine, where the kernel code execution can be monitored by the means of the GDB program. KGDB is distributed as a kernel patch, which must be applied to the kernel source tree before the kernel is compiled. These modifications add some functionality to the kernel, which is necessary for the debuggung process: \begin{itemize} \item The GDB stub, which is the heart of the debugger. This is the part that handles requests coming from GDB on the development machine. It has control over all processors in the testing machine when the kernel running on it is inside the debugger. \item Modifications to the kernel fault handlers -- instead of doing a kernel panic as outlined in section~\ref{kernel-panics}, these modifications to the fault handlers allow kernel developers to analyze unexpected faults by giving the control over the machine to the GDB stub. \item Communication -- there are two versions of this component. They both have the purpose of establishing a connection between the development and the testing machine. One version can use a serial line to connect these two machines, the other can work over ethernet by using UDP/IP frames for message exchange. It is necessary to have a implementation of this functionality seperate from the one the Linux kernel already offers in order to keep the side effects of debugging as small as possible. This component is also responsible for handling control break requests sent by the GDB on the development machine. \end{itemize} In this work, KGDB could be used to track down several bugs which involved connection establishment and shutdown. While having a full-fledged debugger at hand to analyze kernel execution is a very valueable thing in itself, its applicability for debugging a networking protocol implementation is limited. These limitations arise from the fact that it is not possible to synchronize the debugging of two or more machines. Because of this, while one machine is suspended for debugging, the other machine experiences several timeouts and eventually assumes the connection is dead. Additionally some problems did not show up at all when using the kernel versions for which KGDB is available (new versions of KGDB are released for chosen kernel versions only, and with considerable delay). Therefore, another way to debug the ESP kernel module had to be used in addition to KGDB, as described in the next section. \subsubsection{Analyzing Kernel Oops Messages} \label{kernel-panics} \begin{figure} \lstset{numbers=left, stepnumber=3, breaklines, breakatwhitespace, frame=single} \centering \begin{lstlisting} Unable to handle kernel NULL pointer dereference at virtual address 0000008 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[] EFLAGS: 00210213 eax: 00000000 ebx: c6155c6c ecx: 00000038 edx: 00000000 esi: c672f000 edi: c672f07c ebp: 00000004 esp: c6155b0c ds: 0018 es: 0018 ss: 0018 Process netgauge (pid: 2293, stackpage=c6155000) Stack: c672f000 c672f07c 00000000 00000038 00000060 00000000 c6d7d2a0 c6c79018 00000001 c6155c6c 00000000 c6d7d2a0 c017eb4f c6155c6c 00000000 00000098 c017fc44 c672f000 00000084 00001020 00001000 c7129028 00000038 00000069 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 8b 40 14 ff d0 89 c2 8b 06 83 c4 10 01 c2 89 16 8b 83 8c 01 Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing \end{lstlisting} \caption[Linux kernel Oops message example]{An example of a Linux kernel Oops message which caused a kernel panic. This information is printed out to the console in the event of a detected error condition inside the kernel.} \label{fig:kernel-oops} \end{figure} Kernel Oops messages are a mechanism in the linux kernel which aims for printing out some vital information whenever the kernel encounters an error condition. This information may be used by a developer to track down and fix the problem. An example for such a message is shown in figure~\ref{fig:kernel-oops}. Whenever a Oops occures, the causing kernel component is killed instantly, together with any userspace processes currently doing system calls to this component. This is done without releasing any locks or cleaning up half-modified data structures, so a machince with an Oopsed kernel should be rebootet as soon as possible to avoid further problems, which are to be expected. Additionally, if killing the causing component implies killing a vital part of the kernel like an interrupt handler, the system is halted completely. The information contained within an Oops message is as follows: \begin{itemize} \item Line 1: An error message briefly decribing what happened. In the example it was attempted to dereference a \NULL\/ pointer. The low, but non-zero address is an indication that there was an attempt to read from a member variable of a \fname{struct}, and the address of this \fname{struct} was assumed to be \NULL. \item Line 4: The value of a counter which is incremented for each Oops the kernel produces. It is important to observe, that only the first of these message contains reliable information. \item Line 6: The code segment (0010) and the value of the extended instruction pointer (EIP). This unambiguously identifies the faulty instruction. \item Lines 7 - 10: The values of the program status and control register, the general purpose registers and more segment registers. \item Lines 12 - 15: The last values stored on the stack. These are parameters to half-run function calls and return addresses. \item Lines 16 - 20: The call trace. These are the addresses of the entry points of the functions which were executed when the error condition occured.\footnote{The in-kernel symbol resolver (``kksymoops'') has the ability to translate these adresses into function names and offsets within these functions. Unfortunately, it was not found to always give reliable results for the ESP module.} \end{itemize} While this information on its own is not very usefull, additional user-space apllications exist which can be used to track down the cause of the problem. Because the value of the EIP gives the function base address plus the instruction offset, it can be used to identify the kernel function where the crash occured. The ``System.map'' file belonging to the kernel allows to look up the corresponding symbol (function-) name, along with its base adress. And knowing the base address of the function gives the offset within this function by subtracting it from the EIP. Now it is possible to inspect the true cause of the problem by examining the specific instructions of the failed function. Fortunately, the process described above can be performed by the ``ksymoops'' user-space program, which takes a kernel Oops message as input and automatically extracts all usable information. \subsection{Creating an Interface for User Settings} The ESP protocol has a view parameters which may be tuned for optimal performance. While there are default values for each of this parameter defined by the means of compile-time constants, it is desireable to give the system administrator a way to modify these parameters without doing a unload module, recompile, load module cycle every time. Setting kernel parameters always involves passing some data from user-space to kernel-space. The most generic way to do this would be to create a new system call, write a user-space library exhibiting the capabilities of this call and finally to create a application which uses this library to allow retrieving or setting the parameters needed. This is just the way iptables (which is the user-space part) and netfilter (which is the part running in kernel-space) are implemented \cite{iptables-netfilter}. Fortunately, this is not necessary if the preferences to set have a structure as simple as just a few integer values, which is the case for the ESP protocol. To have a consistent interface for accessing such parameters, the ``sysctl'' interface was introduced with the 4.4BSD version of Unix and ported to Linux as of kernel version 1.3.57. \subsubsection {The Sysctl Interface} \label{sysctl-interface} The sysctl interface consists of a single function\footnote{Under BSD, two more sysctl-related functions exist. These are \fname{sysctlbyname} and \fname{sysctlnametomib} and allow for accessing the sysctl interface via human-readable names instead of an array of intergers.} implemented in the standard C library, libc. This function transports the parameter to set from user- to kernel-space and vice versa. This definition of this fuction is: \begin{lstlisting} int sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, size_t newlen); \end{lstlisting} The first two parameters tell the \fname{sysctl} function, which kernel parameter shall be accessed by the means of an array of integer values and the length of this array. All sysctl parameters are organized in an out-tree, where the nodes and leaves are identified by integer numbers. The array \fname{name} gives a path through this tree; the root node is implicitly given. For the sake of unambiguousness of the \fname{name} parameter, it is obviously neccessary that the children of any node in the tree have associated unique numbers. While this requirement is easy to stick with for the interior nodes of an sub-tree added to this hierarchy, special care has to be taken about the root of this sub-tree as outlined in section~\ref{adding-sysctl}. The next two parameters are for retrieving the old value of the parameter and storing it at the memory pointed to by \fname{oldp}. Finally, the last two parameters are for setting a new value. If one of these operations is not desired, its corresponding parameters may be set to zero. The sysctl interface itself does not make any assumptions about the structure of the data passed between user- and kernel-space, as the data is given by a \fname{void}-typed pointer. The convention about the data transferred is implmented only in the specific kernel module and the calling user-space application. \subsubsection{Adding a Sysctl Interface to ESP} \label{adding-sysctl} In order to exhibit any internal settings via the sysctl interface, an array of the \fname{struct ctl\_table} has to be filled in. Each entry stands for either an interior node, which allows to logically group the parameters; these groups can be used for access permissions as well--or it stands for a leaf node representing an actual value which may be read or set. The most interesting fields to be initialized in the \fname{struct ctl\_table} are: \begin{itemize} \item \fname{ctl\_name}, which is the mentioned unique identification number, \item \fname{mode}, the access permissions in classical Unix notation, \item \fname{data}, a pointer the the destination of the supplied data in kernel-space and \item \fname{procname}, a human-readable name of the parameter (why this is supported under Linux despite the absence of the \fname{sysctlbyname()} and \fname{sysctlnametomib()} functions is explained in section~\ref{using-sysctl}), as well as \item \fname{proc\_handler} and \fname{strategy}, which are both function pointers \end{itemize} The \fname{proc\_handler()} function implements a Linux specific extension of the sysctl interface. Under Linux, the complete tree of sysctl parameters known to the system is mirrored in the directory \fname{sys/} of the procfs \cite{procfs-guide} virtual filesystem. There, every interior node of the sysctl tree is represented by a directory entry and the leaf nodes are represented by files, which may be read from or written to like any other file in the filesystem. This requires converting the parameters to be accessed from/to a string representation and some handshaking to allow serial access to these strings, as required by the file access API. These tasks are carried out by the \fname{proc\_handler()} function. The \fname{strategy()} function pointer may be set to be zero, which will cause the kernel to call a default implementation of this function. This default implementation will just perform some minimal validity checks based on the size parameters of the \fname{sysctl()} function and copy the data from user-space to where \fname{data} points in kernel-space. This behavior is fine for all parameters of the ESP protocol, with the only exception being the round-trip time. The round-trip time is special because this value determines a timeout. In the ESP protocol implementation, a timeout is realized by scheduling a timer. This is accomplished by making a call to the \fname{\_\_mod\_timer()} or a similar kernel function. All these functions expect the timeout to be given in ``jiffies''. Jiffies is a variable inside the Linux kernel which keeps increasing forever at a fixed rate as the result of a hardware interrupt. It is the basic packet of time in the Linux kernel, and the rate of this hardware interrupt is given by the kernel's compile-time constant \fname{HZ}. Its value depends on the architecture, and on some architectures it can even be modified during kernel configuration. In conclusion, it it desireable that the person who wants to set up the ESP protocol on a machine does not have to know about these details. Therefore the time measure for the round trip time was chosen to be $\mu$s and is automatically converted upon getting and setting of this parameter by special-cased implementations of the \fname{proc\_handler()} and \fname{strategy()} functions. These functions take the desired timeout value in microseconds and set ESP's internal variable to the next full jiffie, thus rounding up the given value. The method of rounding up was chosen, because having the RRQ timer to kick in a little too late has only minor effect on overall performance as shown in \ref{handling-packet-loss}. On the other hand, if an RRQ is sent too early, it will cause a bunch of packets to be transmitted again at no avail, which would degrade performance badly. The preferred place where the ESP options should show up in the sysctl tree is \fname{CTL\_NET/CTL\_ESP}. But with the current implementation of the sysctl interface in Linux it is not possible to attach new children to the interior nodes of the sysctl without patching the kernel source, which does not seem to be worth the hassle. Therefore, all ESP options are grouped under the \fname{CTL\_ESP} node, which is a direct child of the sysctl root node. The numerical value of the \fname{CTL\_ESP} constant is defined in the \fname{af\_enet.h} header file and was set to a value that is currently unused by the rest of the Linux kernel. The values reserved for Linux core components can be found in the file \fname{linux/sysctl.h} of the kernel source code. \subsubsection{Using the Sysctl Interface} \label{using-sysctl} Under Linux, there are three ways to read and set the parameters exhibited through the ESP sysctl interface: \begin{enumerate} \item Using the \fname{sysctl()} function call. To use this function call, it is necessary to know the \fname{ctl\_name} constants which were used by the kernel module upon registration. These are defined in the \fname{af\_enet.h} header file which comes with the ESP kernel module and has to be included by every application which wants to use this protocol. \item Accessing the files under the \fname{/proc/sys/esp/} virtual file system. This allows for quick testing of parameters by just using console commands like \fname{cat} and \fname{echo}. Having to give these directories and files senseful names is the reason why the \fname{procname} entry in the registration struct is needed. \item Using the \fname{/sbin/sysctl} program. This program is also capable of reading the parameters to set from a file, which is used by most Linux distributions to set the parameters specified in the \fname{/etc/sysctl.conf} file at boot time. \end{enumerate} A full list of all parameters ESP offers through the sysctl interface is shown in table~\ref{sysctl-overview}, along with a short description of meaning of the individual settings. \begin{sidewaystable}[p] \begin{tabular}{|l|l|p{8cm}|} \hline Constant & procfs Entry Name & Description\\ \hline\hline \fname{CTL\_ESP} & \fname{esp/} & The root node of the sysctl subtree for ESP.\\ \hline \fname{CTL\_BURST\_LENGTH} & \fname{burst\_length} & The window size $w$ used during bulk transfers. The is the number of packets the protocol may have in flight when it knows the TXS was received successfully.\\ \hline \fname{CTL\_INITIAL\_ACK\_BURST\_LENGTH} & \fname{initial\_ack\_burst\_length} & The window size when waiting for the first ACK which acknowledges the TXS frame has been received. This resembles the initial window size of the TCP protocol.\\ \hline \fname{CTL\_PACKETS\_TO\_ACK} & \fname{packets\_to\_ack} & The number of data frames needed to trigger the sending of an ACK. The detection of packet loss and the receipt of an TXS cause the immediate sending of an ACK, independent of this setting.\\ \hline \fname{CTL\_SEND\_BUFF} & \fname{send\_buff\_size} & The size of the send buffer to be used. Only affects sockets allocated after setting a new value.\\ \hline \fname{CTL\_RECV\_BUFF} & \fname{recv\_buff\_size} & The size of the receive buffer to be used. Only affects sockets allocated after setting a new value.\\ \hline \fname{CTL\_ROUND\_TRIP\_TIME} & \fname{round\_trip\_time} & The round trip time assumed for the connection. This value is to be given in $\mu$s and is automatically rounded up to the next full jiffie\footnote{See \ref{adding-sysctl} for an detailed explanation.}.\\ \hline \end{tabular} \caption{Parameters of the ESP protocol exhibited through the sysctl interface.} \label{sysctl-overview} \end{sidewaystable} %%% Local Variables: %%% mode: latex %%% TeX-master: "main" %%% IspellDict: "english" %%% End: