The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

C++ design decision: use library supplied string class or char*?

I'm writing a moderately complex application in C++. Admittedly, this is my first "real" C++ project so it's bound to be a God-awful mess.

I understand OOP extremely well thanks to years with Smalltalk. C++ is bit trickier, mostly due to the strong typing and lack of garbage collection. I'm using a GUI toolkit that may not be available/acceptable for other platforms I need to port to, so I'd like to abstract the use of the toolkit away behind subclasses with generic methods in case I need to rip it out and use something else later.

This worked really well for GUI components like windows and buttons, but I'm running into a problem with strings. The GUI library I'm using supplies its own string class which is pretty useful. I don't plan on writing my own string class, so my options are:

1: forget about abstracting away the toolkit and use the library's string class everywhere;
2: use it in as few places as possible, convert to char* as soon as possible;
3: derive from it, rename the methods a little so they fit in nicer with the app, and perhaps add a few methods of my own and use this derived string class everywhere instead, upcasting where needed;

I really don't want to do option 1. Option 2 is what I started doing, but I can see now, having written just a small part of the app, how much needless copying and iterating (to get the length of string) this method entails (and cleanup, because of the lack of a destructor). Option 3 looks good, but it will probably require lots of copying to create instances of my derived class from strings of the base class.

What do you think I should do?
Skull-0-mania
Sunday, October 01, 2006
 
 
Use the C++ standard library string class - std::string.  This will give you all the nice string functions and won't tie you to your GUI toolkit.
Mike S Send private email
Sunday, October 01, 2006
 
 
I'd suggest:

1) Don't use "char*": if you want to pass strings from one function to another, it's difficult to remember whether/when to delete them.

2) Use the STL's std::string class in your portable code.

3) Define conversions from std::string to/from your GUI-specific string class (so you can use std::string in your portable code and GUI-specific string classes in your GUI-specific classes.
Christopher Wells Send private email
Sunday, October 01, 2006
 
 
std::string is part of the C++ standard.  It should be your default choice when starting a project.  It is defined in <string>

#include <string>
using namespace std; //or just std::string if your prefer

string s;
s="hi";
s+=" there";
cout<<s; //prints "hi there"

it has array subscripting if you want to access or modify it at the character level.
Tom C
Sunday, October 01, 2006
 
 
Christoper,

2) and 3) are great advice.

How does string over char* solve 1) ?
k
Sunday, October 01, 2006
 
 
std::string has a value semantics and as such is copyable. Memory will always be implicitly and correctly managed for you by std::string. Even in case of exceptions.
Managing several raw ressources (like char *), in "exceptional" environment turns quickly into a nightmare. Classes like std::string are there to help.
Luc Hermitte Send private email
Sunday, October 01, 2006
 
 
A pointer to string
B exception throwing code
C delete

how does string help you here?
k
Sunday, October 01, 2006
 
 
Check RAII in C++ FAQ or any up-to-date C++ ressources.
Luc Hermitte Send private email
Sunday, October 01, 2006
 
 
RAII suggests a destructor that frees the memory. A pointer  to std:string doesn't do that any more than a pointer to char* would.
k
Sunday, October 01, 2006
 
 
std::string are not meant to be used through pointers. It's a RAII value type.

BTW, RAII is far from being restricted to memory management.
Luc Hermitte Send private email
Sunday, October 01, 2006
 
 
you are missing the point: you are rarely going to use std::string*

whether using std::string* or char* you have to be concerned with memory management, of course.

However there is a significant difference between using std::string and char*.  The first takes care of memory management, the second doesn't.

When using char* as your string abstraction you are always using pointers and dealing with memory management

when using std::string you are occasionally using pointers and dealing with memory management.  For the most part you are using strings directly.
Tom C
Sunday, October 01, 2006
 
 
If you are going to make your libraries available to other people, I'd use char*/wchar* in the public interface so that you aren't forcing anybody to use a specific string implementation. (Behind the scenes, use whatever you want.)
dev1
Sunday, October 01, 2006
 
 
> How does string over char* solve 1) ?

With char* you don't know whether/when to delete ...

char* getHello();

void test()
{
  char* hello = getHello();
  //should I delete hello?
}

.. whereas with std::string you don't need to worry about that ...

string getHello();

void test()
{
  string hello = getHello();
  //memory contained within string will be deleted by the string constructor when it goes out of scope; the string implementation may be smart enough to share the same memory between multiple string instances when that's safe
}

The string class acts approximately like a "smart pointer" around the underlying character buffer.
Christopher Wells Send private email
Sunday, October 01, 2006
 
 
Thanks Chris.  Your comment has the same thrust as Tom C and Luc. Appreciate your code example in particular.

It boils down to: Allocate things on the stack and you won't have to deal with pointers.
k
Sunday, October 01, 2006
 
 
Go with option 1. Seriously. Make it work first. Understand the rest of the guitoolkit. Then when its time to port, research your toolkit options.  Its safe to assume the designers of the toolkit know c++ better than you. Maybe there is a technical reason they couldnt use std::string.  std::string is not a trival class.  At my last two companies we had custom string classes because the std::string didnt cut it. The standard moves slowly.
B
Monday, October 02, 2006
 
 
Use option 2. Many libraries have their own string classes, some quite useful, but it's not a good design decision to use those string classes outside calls to the library.

For instance, COM objects accept BSTRs. Does that mean all the strings in your program must be BSTRs? What if it's an old MFC app calling a COM objects? Are you going to convert all those CStrings to BSTR just because one libary insists on BSTR? What if you use the Xerces XML parser? IIRC, it has its own string class. If you use Xerces in one spot, does that mean you should use the Xerces string everywhere else?

Hell no!

Give the libraries what they require, but use what you want to use everywhere else. Just make sure you can convert your strings to the library string type, if required.

Don't expose yourself to more than one string type for the bulk of your program. You may decide to switch memory managers at some point, and the last thing you want to happen is to find out that GUI library X's string class cannot accept your memory manager.
MBJ Send private email
Monday, October 02, 2006
 
 
> It boils down to: Allocate things on the stack

Not exactly: it's that when you allocate things on the heap, wrap that allocation in a class (e.g. string or smart pointer) whose destructor, copy constructor, and assignment operator 'do the right thing' by managing the lifetime of that allocation for you.
Christopher Wells Send private email
Monday, October 02, 2006
 
 
... and then, as you said, allocate instances of those classes on the stack (or use them as by-value data members of larger classes, which may be allocated on the heap with their lifetimes too being managed by containing them within another class that uses the RAII idiom).
Christopher Wells Send private email
Monday, October 02, 2006
 
 
>> ... and then, as you said, allocate instances of those classes on the stack <<

AND, if I may add, pass these instances around (as method parameters) as references. If you've (original poster) dealt before with pointers only, it's important to note that C++ has a "special" kind of pointer called "reference". It is almost like a pointer, but not quite the same. For example, it can never be null. Its syntax is also more value-like: for example, you don't say obj->method() but obj.method(). You also can not delete the instance through the reference. If you don't see how this could be useful, I encourage you to check "references" out - they're quite useful.
Drazen Dotlic Send private email
Monday, October 02, 2006
 
 
Yes you'll rarely use pointers (except within the private implementation of a single class): the interfaces between classes will usually be pass-by-value, pass-by-reference, and/or pass-by-const-reference.
Christopher Wells Send private email
Monday, October 02, 2006
 
 
Well, Chris. (and others) It's been quite helpful to spend a few minutes getting the feel from a guru, so thanks for that.  If I could press my luck, and ask you about your last post.

you say
"whose destructor, copy constructor, and assignment operator 'do the right thing' by managing the lifetime of that allocation for you"

Again, is the benifit here "just" that you have a std:string so the copy constructor works (instead of aliasiing a pointer) and the assignment operator assigns (rather than aliasing.) ?  I'm not sure what the destructor of std:string is doing, it's not deleting any heap allocations, I guess again it's just the use the stack aspect of std:string working in your favor.  It appears, again, that the std:string is so helpful mostly because there is no pointer, it's std:string not *std:string.  Is that approximately correct?
k
Monday, October 02, 2006
 
 
Use value based programming whenever you can. Just copy until performance provably sucks. Sharing data leads to so many bugs it's simply not worth it.

If your memory allocator is the problem then fix that, don't program around your allocator.
son of parnas
Monday, October 02, 2006
 
 
k wrote:
I'm not sure what the destructor of std:string is doing, it's not deleting any heap allocations, I guess again it's just the use the stack aspect of std:string working in your favor.  It appears, again, that the std:string is so helpful mostly because there is no pointer, it's std:string not *std:string.  Is that approximately correct?

No, it is not correct.  Variables of string type certainly have heap-allocated content; the string destructor is freeing that memory.
  { // allocate, initialize
    std::string foo = "a";
    // put a thousnd "a"s there too -- along the way
    // expect on-heap allocations internal to foo:
    for (int i = 0; 1 < 1000; i++) foo += "a";
    // ... do something with foo
  } // just went out of scope, so foo destructor
    // was called, which freed all that dynamically
    // allocated space

The use of pointer to std::string is orthogonal to this issue.  For example this is natural in C++
  void some_setter(std::string *foo_p)
  {
    if (foo_p)
        *foo = "string from somewhere";
  }

(C++ also allows references in this context, but there are valid reasons to prefer pointers here.
Will Dowling Send private email
Monday, October 02, 2006
 
 
<< Again, is the benefit here "just" that you have a std:string so the copy constructor works (instead of aliasiing a pointer) and the assignment operator assigns (rather than aliasing.) ? >>

That's a large part of it. Also, as part of managing mem allocations for your strings, std::string implements copy-on-write, which reduces storage requirements and improves performance. As a result, even if a function or method returns a std::string, the overhead is minimized.

By all means, make std::string your default string representation, and create conversion functions/methods where you need to provide a "custom" string class.

You'll find the code much clearer (though more verbose, on occasion), and you'll have more time to spend doing productive work instead of chasing hard-to-duplicate bugs caused by buffer overflows. (Trust me on this; I'm currently working on legacy code chock-full of wchar*s, and probably 3/4 of all the crashing bugs I've tracked down and fixed in this code were just that. Recoding as std::wstring *always* improves the code.)

FWIW, I've found the docs at http://www.cppreference.com/index.html very useful.
Samuel Reynolds Send private email
Monday, October 02, 2006
 
 
Some implementations of std::string are moving back from a COW design. As far as I understand, implementing a COW string costs more than it's worth.
Luc Hermitte Send private email
Monday, October 02, 2006
 
 
Thanks. Reallly helpful.
k
Monday, October 02, 2006
 
 
1. Use std::string.  It provides the c_str() function which will be required to work with standard C APIs.
2. Don't ever derive any class from the standard containers.  They don't have virtual destructors and aren't meant to be base classes.  If you ever want to extend the functionality, it is generally recommended that your new class CONTAIN one of the stl types.
3. Early on, consider the possibility that you'll need to build for Unicode.  On Windows, the TCHAR types compile down differently depending on certain preprocessor types.  I've used something like this:

// tstring.h
#include <string>
#ifdef _UNICODE
  typedef std::wstring tstring;
#else
  typedef std::string tstring;
#endif

Then include "tstring.h" and use 'tstring' throughout.  Voila, you're ready for Unicode.
Meganonymous Rex Send private email
Wednesday, October 04, 2006
 
 
I meant "preprocessor symbols" not preprocessor types.

Specifically, on Windows, I believe that both _UNICODE and UNICODE must be defined.

-mrex
Meganonymous Rex Send private email
Wednesday, October 04, 2006
 
 
It's a common misconception that references are safe from being NULL. Consider the following example:

std::string Hello(const std::string & name)
{
  std::string result = "Hello ";
  result.append(name);
  return result;
}

void Testing(void)
{
  std::string * ptrName = NULL;
  cout << Hello(*ptrName) << endl;
}

In a perfect world, the dereferencing of a NULL pointer would happen at the point of the function call, and would be easy to debug. With Visual C++ you don't see the error until you try to use the reference inside the called function; in this example, it even goes a level deeper into the append method. I'm sure most other C++ compilers work the same way.
Mark Ransom Send private email
Friday, October 06, 2006
 
 
are you serious?  I don't read the standard, but that looks like it does not meet the standard.  That would mean that it is a compiler bug.
Tom C
Friday, October 06, 2006
 
 
If the code was written by someone who doesn't use a pointer when he could use a reference instead, you should almost always test the pointer for nullness before you dereference it.
Christopher Wells Send private email
Saturday, October 07, 2006
 
 
@Mark: Dereferencing NULL leads to undefined behavior which is what is done inside the Testing() function:

void Testing(void)
{
  std::string * ptrName = NULL;
  cout << Hello(*ptrName) << endl; // <-- this is the erroneous line, the Hello() function is correct.
}

The implementation you are using is perfectly valid, too. It is just forbidden (by the C++ standard) to dereference NULL pointers. Nevertheless, someone did it: Welcome to the land of undefined behavior.
Jimbo Jones
Monday, October 09, 2006
 
 
Thanks, I knew where the error was. This is just the trivialized version of a bug I've seen in real code.

It's all about leaky abstractions. References in C++ are just an abstraction, implemented using a pointer under the covers. C++ compilers don't generally do any error checking that can't be done at compile time, so the pointer is copied without even examining the contents; the dereference won't be caught, even though the behavior is technically undefined.

I defy you to find a compiler that works differently.
Mark Ransom Send private email
Monday, October 09, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz