Solving the strcat() Woes

In C, the common and canonical way to concatenate strings at runtime is to use strcat(). For most cases, this works perfectly fine - here's a simple example:

const char *intro = "Hi ";

const size_t len = strlen (intro) + strlen (name);  
char *greet = malloc (len + 1);  
if (!greet) {  
        // handle error
}


strcpy (greet, intro);  // copy initial string with null terminator  
strcat (greet, name);   // copy the name and move the null terminator  

Now, if you need to concatenate multiple strings, this can become a bit of an issue. For starters, you're going to end up with a chain of functions (contrived example, I know):

const char *arr[] = { "Hello", " ", "World", "!" };

char str[5 + 1 + 5 + 1 + 1] = "";  
strcat (str, arr[0]);  
strcat (str, arr[1]);  
strcat (str, arr[2]);  
strcat (str, arr[3]);

// str now contains "Hello World!"

In cases like this, of course, it's trivial to rewrite this in a for loop. However, that overlooks the performance problem. Every time you call strcat(), it does this:

strcat (char *dst, const char *src) {  
        size_t off = strlen (dst);
        size_t len = strlen (src);

        // don't forget to copy the NULL terminator too
        memcpy (dst + off, src, len + 1);

        return dst;
}

See it yet? strcat() calls strlen() to find the end of the string you're appending to every single time. As Joel Spolsky has previously explained:

Since the first part of strcat has to scan through the destination string every time, looking for that dang null terminator again and again, this function is much slower than it needs to be and doesn't scale well at all.

...

Whenever something seems like it should have linear performance but it seems to have n-squared performance, look for hidden [joke reference]. They are often hidden by your libraries. Looking at a column of strcats or a strcat in a loop doesn't exactly shout out "n-squared," but that is what's happening.

I didn't particularly the example solution he presented though - you still end up building chains, and ironically enough, you may actually end up with slower code, since you're copying byte by byte, whereas memcpy() can usually copy multiple bytes in a cycle by using architecture-specific vector instructions (ex: SSE/AVX on x86_64).

Also, in real code, allocating strings of unknown length on the stack is BAD. If your string is too long, you will make the stack overflow (ie: your program will SEGFAULT and usually crash immediately). A solution that you might get if you ask around is to use sprintf():

char * s = malloc(snprintf(NULL, 0, "%s %s", first, second) + 1);  
sprintf(s, "%s %s", first, second);  

This makes an extra call to to sprintf(), but platform specific solutions to avoid that are also presented in the post. Two calls isn't the problem here though. Functions from the printf() family tend to be on the slow side (this show up if you call them a lot), because they have to parse the formatting string upon each call.

Therefore, I present:

#include <stdarg.h>
#include <string.h>
#include <stdlib.h>

char *astrcat (const unsigned int n, ...) {  
        if (!n)
                return malloc (0);

        va_list args;
        va_start (args, n);

        size_t x, len = 0;
        size_t l[n];
        for (x = 0; x < n; x++) {
                l[x] = strlen (va_arg (args, 
                                        const char *));
                len += l[x];
        }

        va_end (args);

        char *restrict r = malloc (len + 1);
        if (!r)
                return NULL;

        size_t off = 0;
        va_start (args, n);
        for (x = 0; x < n; x++) {
                memcpy (&r[off],
                        va_arg (args, 
                                const char *restrict),
                        l[x]);
                off += l[x];
        }

        r[len] = 0;
        va_end (args);
        return r;
}

This function allocates and returns a string containing all of the input strings concatenated. I reorganized this into a few more functions for more convenient usage:

  • astrcat() - takes variable args (number of strings restricted to <= 255 due to max parameters
  • vastrcat() - takes a va_list (restricted as above)
  • aastrcat - takes an array of strings (number of strings restricted to <= 65535 due to stack allocation of lengths)
  • _astrcat() - takes a list of lengths and a list of strings (unrestricted)
  • __astrcat() - the base implementation, which takes the same arguments as _astrcat preceded by the total length of the result string

With this, you can effectively turn the chained example into this:

const char *arr[] = { "Hello", " ", "World", "!" };

char *str = aastrcat (4, arr);  

Isn't that so much cleaner?

The header and implementation files, both of which you are free to to relicense to any non-viral licenses (no GPL, no CDDL, no LGPL, no AGPL, no MPL, no other copyleft licenses, yes MIT, yes BSD, yes Apache, etc), are available on GitHub: