Phillip Trelford's Array

POKE 36879,255

String and StringBuilder revisited

I came across a topical .Net article by Dave M Bush published towards the tail end of 2014 entitled String and StringBuilder where he correctly asserts that .Net’s built-in string type are reference types and immutable. All good so far.

The next assertion is that StringBuilder will be faster than simple string concatenation when adding more than 3 strings together, which is probably a pretty good guess, but lets put it to the test with 4 strings.

The test can be performed easily using F# interactive (built-in to Visual Studio) with the #time directive:

open System.Text

#time

let a = "abc"
let b = "efg"
let c = "hij"
let d = "klm"

for i = 1 to 1000000 do
   let e = StringBuilder(a)
   let f = e.Append(b).Append(c).Append(d).ToString() 
   ()
// Real: 00:00:00.317, CPU: 00:00:00.343, GC gen0: 101, gen1: 0, gen2: 0
   
for i = 1 to 1000000 do
   let e = System.String.Concat(a,b,c,d)
   ()
// Real: 00:00:00.148, CPU: 00:00:00.156, GC gen0: 36, gen1: 0, gen2: 0

What we actually see is that for concatenating 4 strings StringBuilder takes twice as long as using String.Concat (on this run 0.317ms vs 0.148ms) and generates approximately 3 times as much garbage (gen0: 101 vs gen0: 36)!

Underneath the hood the StringBuilder is creating an array to append the strings into. When appending if the current buffer length is exceeded (the default is 16) then a new array must be created. When ToString is called it may, based on a heuristic, decide to return the builder’s array or allocate a new array and copy the value into that. Therefore the performance of StringBuilder is dependent on the initial capacity of the builder and the number and lengths of the strings to append.

In contrast, String.Concat (which the compiler resolves the ‘+’ operator to) calculates the length of the concatenated string from the lengths of the passed in strings, then allocates a string of the required size and copies the values in, ergo, in many scenarios it will require less copying and less allocation.

When concatenating 2, 3 or 4 strings we can take advantage of String.Concat’s optimized overloads, after this the picture changes as an array argument must be passed which requires an additional allocation. However String.Concat may still be faster than StringBuilder in some scenarios where the builder requires multiple reallocations.

But wait there’s more, going back to the ‘+’ operator, if we assign the integer literal expression 1 + 2 + 3 the compiler can reduce the value to 6, equally if we define the strings as const string then the compiler can apply the string concatenations at compile time leading to, in this contrived example, no cost whatsoever.

The moral of the story is when it comes to performance optimization - measure, measure, measure.

Comments (12) -

  • kwp

    4/3/2015 3:44:23 AM |

    Hi Phil,

    I am no expert, but I do think this test is at least a little subjective / unfair and could have been conducted better. There is no need to repeatedly re-instantiate the StringBuilder thus creating unnecessary garbage to clean up, and string literals are interned so there is a chance that interning cause spurious results.

    From MSDN:

    The StringBuilder dynamically allocates more space when required and increases Capacity accordingly. For performance reasons, a StringBuilder might allocate more memory than needed. The amount of memory allocated is implementation-specific.

    StringBuilder does not necessarily allocate more memory every time text is appended.

    I ran some tests and did not observe any of the characteristics in your results. My results were much the same for 100, 1000, 100'000, 1'000'000, and even 100'000'000 iterations of concatenation of randomly-generated strings 6 characters in length.

    Script and results:
    http://pastebin.com/6NAB6tLK

    Thanks for a great blog and plenty of food for thought.

    • Phil

      4/3/2015 4:58:11 AM |

      Thanks Kingsley for your reply,

      I am sorry to hear that you felt that the article was somehow "unfair" or "could have been conducted better". My aim is merely to inform.

      With regards re-instantiating the StringBuilder in the test, in most use cases of StringBuilder I've seen in the real world a local instance is created within the scope of a function rather than a global instance created and reused, and this is the scenario put forward in Dave M Bush's article.

      I agree with your statement "StringBuilder does not necessarily allocate more memory every time text is appended", in fact this is what I stated in the article, i.e. : "When appending if the current buffer length is exceeded (the default is 16) then a new array must be created". If you're interested in learning more about how it works I'd recommend reading the source code to Append string as I did.

      I had a look at your tests. Unfortunately in your tests the constant allocation of new strings from newly allocated arrays for each iteration significantly outweighs the cost of the String.Concat and StringBuilder operations, thus your tests are primarily testing the cost of your getstring function rather than what I think you intended to.

      Best regards,
      Phil

      • Kingsley

        4/3/2015 6:55:26 AM |

        Hi,

        My intention is neither to speak out of turn nor to criticize any of the valuable and interesting information you share.

        With regard to instantiation and scope of the StringBuilder, I accept that your article is based on Dave M Bush's post and that real-world implementation details will vary depending on the scenario.

        Having the StringBuilder within the function though, to my mind at least, defeats the point of a) having a settable Capacity property, and b) having the ability to work in memory space that has already been allocated, perhaps by previous iterations. Why throw it all away and re-allocate it?

        I use StringBuilders a lot, and definitely get the impression the type was designed to be used in a global sort of way with the specific goal of reducing cost expended by allocating memory. Better still, the Capacity property can be set beforehand, removing the necessity for dynamic allocation altogether. Obviously this is not possible in all scenarios.

        A big thank you for pointing out that my first tests were meaningless, I thought that the string generation function would not be a problem because it has a constant cost, but you are right, the cost far overshadows what we're interested in, the cost of String.Concat vs. StringBuilder.Append.

        I have made a change to the code to address the issue and run the tests again. The results now clearly indicate that StringBuilder.Append is quicker and less messy than String.Concat by about 30% at 100'000'000 iterations.

        Code and results:
        http://pastebin.com/3djWrsWm

        Naturally there might still be a problem with my approach. If this is the case, then I give up!

        Regards,
        Kingsley

        • Phil

          4/3/2015 7:35:21 AM |

          Hey no problem Smile

          I use StringBuilder from time-to-time, and will provide an initial capacity in the constructor (again mentioned in the article).

          "Why throw it (the StringBuilder) all away and re-allocate it?"

          This is very context specific, but one example is in the context of multi-threaded programming.

          The updated tests are definitely an improvement and you are now closer to testing the individual performance of String.Concat and StringBuilder. However they are not yet testing the relative performance as you are asking String.Concat to perform 5 concatenations whereas you are only asking the StringBuilder to do 4. This would probably explain why you are seeing StringBuilder taking less time.

          Best regards,
          Phil

  • Kingsley

    4/3/2015 8:14:41 AM |

    OK, I see the mistake - my bad! I'm coding in Notepad...

    So, as you said originally, Concat is quicker. The exception is when Capacity is used effectively.

    Thanks for communicating,
    Kingsley

    • Phil

      4/3/2015 8:44:13 AM |

      No worries,

      For the benefit of other readers of this posts, it's probably worth re-iterating that the assertion in the main body of the article holds, that String.Concat is quicker than StringBuilder for concatenating 4 strings (regardless of whether you reuse the StringBuilder instance or what initial capacity you specify to the StringBuilder).

      Best regards,
      Phil

  • phil

    4/6/2015 11:53:27 AM |

    Kingsley,

    In your latest sample you are comparing an optimized StringBuilder scenario (pre-allocating the instance & setting the capacity) against the worse case scenario for String.Concat (passing an array instead of the strings for 2, 3 & 4 values).

    When you write in C#, string a="a"; string b="b"; string c=a+b; the code it generates is, string c=String.Concat(a,b); which is what this article used and is the scenario which gives the better performance.

    To conclude writing "c = a+b" or "e = a+b+c+d" seems "less messy" and due to the optimized String.Concat generated by the compiler, runs faster than using a StringBuilder.

    Yes, beyond 4 string concatenations and many other scenarios using StringBuilder may be faster, but in all cases if performance is important to you, you may want to profile your code in such a way that you are not attempting merely to confirm your bias.

    Best regards,
    Phil

  • kingsley

    4/7/2015 3:07:30 AM |

    Accepted, agreed, and 100% clear to me now.

    Initially, I did not pay close enough attention to the emphasis of the test being dependent on the use of the optimized overloads, thus being specific to 2, 3, or 4 strings only. For this, I apologize, as I have wasted your time.

    In all honesty, for so few strings I would have considered neither a StringBuilder nor a call to String.Concat. I would have just (+)'d them together in the first place.

    In future, I will think twice before allowing skepticism to take hold.

    Updated test and (correct) results for what it's worth:
    http://pastebin.com/iggWcWys

    Kind regards,
    Kingsley

  • Paul Westcott

    4/7/2015 9:58:06 PM |

    String Concat optimization is included in FSharpQuotationEvaluator Smile

    github.com/.../QuotationsEvaluator.fs#L706-L716

    • kingsley

      4/9/2015 11:33:19 AM |

      Interesting. Active patterns are certainly a feature I intend to master and exploit in future projects.

      The "Invalid logic" exception is great : ) - by far one of the most definitive, explanatory, and just plain meaningful messages I have ever seen...

      • Paul Westcott

        4/13/2015 4:11:31 PM |

        I can only assume that you are being sarcastic, but actually it is the most definitive, explanatory and meaningful messages you may *ever* see!

        The reason is that with Active Patterns the compiler is unable to determine if a complete pattern match has been achieved, which in this case the prior call to TraverseExpr ensures that it has. So if the compiler was "smarter" then it would know that this code is actually unreachable, so the logic is, shall we say, invalid!

        (And if you are really interested in learning, there are probably easier places to learn about Active Patterns (and Quotations for that matter) than the QuotationEvaluator...)

Comments are closed