The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
TechInterview.org
CityDesk
FogBugz
Fog Creek Copilot


The Old Forum


Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

MFC ==> STL and CString ==> string

I've gotten accustomed to high-performance concatenation using MFC code via pre-allocated CString buffers, e.g.,

  CString strFoo;
  strFoo.GetBufferSetLength(100000);
  strFoo = "";
  for (i = 0; i < 10000; i++) {
    strFoo += "blahblah";
  }

which is lightning fast.  Conversely, STL using:

  string sFoo;
  for (i = 0; i < 10000; i++) {
    sFoo += "blahblah";
  }

is brutally slow.  No pre-allocation seems possible using STL's 'string'.

And 'rope' is not feasible due to a VS6.0 requirement.

Any recos regarding high-performance string concatenation without inventing YASC (yet another string class)?

cheers.
directorblue Send private email
Tuesday, October 19, 2004
 
 
Doesn't std::string have a method named "reserve"?
Christopher Wells Send private email
Tuesday, October 19, 2004
 
 
Can you use basic_string?  From

http://www.sgi.com/tech/stl/basic_string.html

void reserve(size_t n) - Requests that the string's capacity be changed.

Forgive me if this sounds ignorant, but I don't understand what "Can't use rope due to VS6.0 requirement."  If VS6.0 supports the STL, and doesn't support rope; you could probably copy over some free implementation of rope into your project. No?
eden li
Tuesday, October 19, 2004
 
 
Try resize (I guess this is similar to reserve) or use basic_string(size_type n, charT c) at construction time.
Dino
Tuesday, October 19, 2004
 
 
Christopher... outstanding.  I should be tapped with a cattle prod because I somehow missed 'reserve' in my STL reference.


Eden... requirement for the compiler is VC6 off-the-shelf.  We could probably use SGI's STL if necessary but I was hoping not to.  Anyhow, 'reserve' definitely does it... lightning fast.


Thanks!
directorblue Send private email
Tuesday, October 19, 2004
 
 
The STL shipped with MSVC v6 is slightly old (and perhaps slightly buggy in a multithreeaded program); you can order an up-to-date version from http://www.dinkumware.com/
Christopher Wells Send private email
Tuesday, October 19, 2004
 
 
Actually the STL shipped with VC 6.0 is *very* buggy. STLPort used to support VC 6.0; I haven't checked for quite a while if it still does or not. It worked a lot better than the stock STL.

The Dinkumware library was also a significant improvement, but the header files are still nigh-unreadable.
Chris Tavares Send private email
Tuesday, October 19, 2004
 
 
The ATL version of CString is pretty fast, and is native to VC6 (at least when updated with the latest ATL headers).
John
Tuesday, October 19, 2004
 
 
STLPort (free) is a good idea if you can't afford the high prices charged by dinkumware.

I even got STLPort to work in "embedded vc++" (the version of STL that comes with evc is very stripped down and doesn't include iostreams, for example).
Kalani Send private email
Tuesday, October 19, 2004
 
 
While we're on the topic, whenever I'm doing a lot of string manipulation in Visual Basic, I have my own simple string class which "preallocates" memory by setting an internal buffer to SPACE(cb), and always doubles the size of cb when it needs to grow. Then I keep my own count of the length of the string. This improves performance consing up strings dramatically.
Joel Spolsky Send private email
Tuesday, October 19, 2004
 
 
I wouldn't say STLPort's sources are "readable" either :)

Has anyone seen a readable STL implementation?
Dan Maas
Tuesday, October 19, 2004
 
 
>While we're on the topic, whenever I'm doing a lot of string manipulation in Visual Basic, I have my own simple string class which "preallocates" memory by setting an internal buffer to SPACE(cb), and always doubles the size of cb when it needs to grow. Then I keep my own count of the length of the string. This improves performance consing up strings dramatically.

Joel, can you explain how doubling a buffer size increases speed? I recall you mentioned something similar in your Back to Basics article, or this "doubling" thing, I don't know why, rings some bells. It sounds remarkably familiar, only I can't remember where I read it, and what it was exactly.
Sathyaish Chakravarthy Send private email
Wednesday, October 20, 2004
 
 
Actually, rope is an SGI extension - ISO C++ doesn't have rope.

--STL
STL
Wednesday, October 20, 2004
 
 
eden li:

std::string is defined as std::basic_string<char> on a conforming implementation, so using basic_string isn't going to change anything. Your advice about reserve() is dead on, though.

<digression>

basic_string is a template so that we can have basic_string<char> or basic_string<wchar_t>. But why did the standard library designers do it that way? Wouldn't it be easier to just have a single string type and use a macro to control whether it uses char or wchar_t, like the MFC CString class?

I learned the answer to that the hard way when I had to build an MFC project with Unicode settings, which had to link against a DLL built without Unicode settings whose interface used CString. Needless to say, the whole thing blew up as soon as I tried to pass a CString across that interface. From then on, the rule in that development group was that CString is not used in interfaces. Ever.

</digression>

To the OP: Like the wise man said, use reserve(). It won't really be necessary when you upgrade to a C++ implementation that grows strings efficiently, but it won't hurt you either.
comp.lang.c refugee
Wednesday, October 20, 2004
 
 
> can you explain how doubling a buffer size increases speed?

Every time the resulting target string would be larger than your already-allocated buffer size, you must:

1) Allocate a newer, bigger buffer
2) Copy existing data from the old buffer to the new buffer (before appending the new character to the existing data that you copied into the newer, bigger buffer)
3) Release the old buffer

Now, imagine that you have a buffer, to which you append 1000 characters, one at time. If your buffer-allocation strategy is to make the new buffer "just big enough" then you will do the above three step 1000 times:

1) Buffer size is 1
2) Buffer size is 2
3) Buffer size is 3
4) Buffer size is 4
...
1000) Buffer size is 1000

If your buffer-allocation strategy is to make the new buffer "twice as big as it was before" then you will do the above three step only 9 times:

1) Buffer size is 1
2) Buffer size is 2
3) Buffer size is 4
4) Buffer size is 8
...
9) Buffer size is 1024
Christopher Wells Send private email
Wednesday, October 20, 2004
 
 
I have to disagree with you here, Chris.

First of all, what Joel said was something to the affect of:

He said instead of
Dim Str as String,

he does

Dim Str as MyString
Str=New MyString
Str.Text="This is my own super-fast string."

where in he does:

CLASS MyString
=============

Private mText as String

Sub Class_Initialize()
    mText=Space(SOME_FIXED_SIZE)
End Sub

Public Property Let Text(ByVal NewValue as String)
    if trim(NewValue)=vbnullstirng Then Exit Property
    mText=Space(2*Len(NewValue))
End Property

Public Function Replace(ByVal ReplaceWith as String)
    mText=Space(2*Len(ReplaceWith))
    mText=ReplaceWith
End Sub

If I understand correctly, that was what he meant. Because it sounded a bit absurd to me, I inquired.

Now about what you said.

If I have to allocate 1000 bytes, I would never do

1) Buffer size is 1
2) Buffer size is 2
3) Buffer size is 3
4) Buffer size is 4
...
1000) Buffer size is 1000

I would just do

char *ptr=(char*)malloc(sizeof(char)*1000);

In VB, I wouldn't do anything but

Dim Str as String*1000

or better yet, I'll leave it variable length

Dim Str as String

You said:

[QUOTE]
If your buffer-allocation strategy is to make the new buffer "twice as big as it was before" then you will do the above three step only 9 times:

1) Buffer size is 1
2) Buffer size is 2
3) Buffer size is 4
4) Buffer size is 8
...
9) Buffer size is 1024
[/QUOTE]

But that is skewing the point under consideration. The point was:

IF current buffer size = (let us say) 20
and you need to accomodate a new string of size 25
let us consider this:

char* Replace(const char *Replace, const char* ReplaceWith)
{
    if((Replace==NULL) || (ReplaceWith==NULL)) return NULL;

    //I'd just do
    free(Replace);
    Replace=malloc(sizeof(char)*(strlen(ReplaceWith)+1);
    if(Replace==NULL) return NULL;
    strcpy(Replace, ReplaceWith);
    return Replace;
}

And I would NOT do

    Replace=malloc(sizeof(char)*((2*strlen(ReplaceWith))+1);
Sathyaish Chakravarthy Send private email
Thursday, October 21, 2004
 
 
> IF current buffer size = (let us say) 20 and you need to accomodate a new string of size 25
let us consider this:

The point is that you've just added 5 bytes, and therefore you needed to realloc. Now, after doing that, statistically the odds are good that the next thing you're going to do is add *another* few bytes.

With *your* method, you going to need to realloc *again*.

With *my* method (where I realloced from 20 to 40, instead of from 20 to 25), I *already* have enough 'extra' bytes (left over from my previous realloc) that I do *not* need to immediately realloc again.

In the long run my method is therefore on average *exponentially* faster than yours. My method also needs on average, maybe twice as much memory as yours.
Christopher Wells Send private email
Thursday, October 21, 2004
 
 
Thanks, Chris!
Sathyaish Chakravarthy Send private email
Thursday, October 21, 2004
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz