Remote Call Framework 3.4
Performance

Remote calls

RCF has been designed with two key performance characteristics in mind. When executing repeated remote calls on a single connection, critical code paths in RCF's server and client implementation adhere to the following two principles:

  • Zero copy - No internal copies are made of remote call parameters or buffers.
  • Zero allocation - No memory allocations are made.

Zero Copy

RCF makes no internal copies of remote call parameters or data while sending or receiving, either on the server or the client.

You should note, however, that serialization may force copies to be made. For instance, deserializing a std::string is not possible without making a copy of the string contents, because std::string always allocates its own storage. The same applies to std::vector<>.

To pass buffers of data through a remote call, without making any copies, RCF provides the RCF::ByteBuffer class. The contents of a RCF::ByteBuffer are not copied upon serialization or deserialization, and instead RCF uses scatter/gather style semantics to send and receive the contents directly.

This means that if you are transferring large chunks of untyped data through a remote call, you should use RCF::ByteBuffer, rather than containers like std::string or std::vector. On typical hardware, transferring multiple megabytes of data in a single call will not stress the system at all.

Zero Allocation

RCF makes a minimum of memory allocations on critical paths, in both client and server code. In particular, if a remote call is made twice with identical parameters, on the same connection, RCF will not make any heap allocations on the second call, either on the client or on the server.

You can verify this by running the code sample below, which overrides operator new to trap any memory allocations:

bool gExpectAllocations = true;
// Override global operator new so we can intercept heap allocations.
void *operator new(size_t bytes)
{
if (!gExpectAllocations)
{
throw std::runtime_error("Unexpected heap allocation!");
}
return malloc(bytes);
}
void operator delete (void *pv) throw()
{
free(pv);
}
// Override global operator new[] so we can intercept heap allocations.
void *operator new [](size_t bytes)
{
if (!gExpectAllocations)
{
throw std::runtime_error("Unexpected heap allocation!");
}
return malloc(bytes);
}
void operator delete [](void *pv) throw()
{
free(pv);
}
RCF_BEGIN(I_Echo, "I_Echo")
RCF_METHOD_R1(RCF::ByteBuffer, Echo, RCF::ByteBuffer)
RCF_END(I_Echo)
class EchoImpl
{
public:
{
return byteBuffer;
}
};
RcfClient<I_Echo> client(( RCF::TcpEndpoint(port)));
// First call will trigger some heap allocations.
gExpectAllocations = true;
client.Echo(byteBuffer);
// These calls won't trigger any client-side or server-side heap allocations.
gExpectAllocations = false;
for (std::size_t i=0; i<10; ++i)
{
RCF::ByteBuffer byteBuffer2 = client.Echo(byteBuffer);
}

In this code sample we are using RCF::ByteBuffer as a parameter, which avoids any allocations being made as part of deserialization.

Typically your code will deserialize objects more complex than RCF::ByteBuffer though, and the deserialization of those objects is likely to cause memory allocations to be made. To eliminate such allocations, RCF provides a object cache which can be used to cache commonly occuring object types.

Object Caching

Serialization and deserialization of remote call parameters can become a performance bottleneck. In particular, deserialization of a complex datatype involves not only creating the object to begin with, but also a number of memory allocations and CPU cycles when deserializing all the fields and subfields of the object.

To improve performance in these circumstances, RCF provides a global cache of objects used during remote calls. Objects used as parameters in one remote call, can be transparently reused in subsequent calls. This means that construction overhead and memory allocations due to deserialization, can be eliminated in subsequent calls.

Here is an example of caching std::string objects:

RCF_BEGIN(I_Echo, "I_Echo")
RCF_METHOD_R1(std::string, Echo, std::string)
RCF_END(I_Echo)
class EchoImpl
{
public:
std::string Echo(const std::string & s)
{
return s;
}
};
EchoImpl echo;
server.bind<I_Echo>(echo);
server.start();
int port = server.getIpServerTransport().getPort();
RCF::ObjectPool & cache = RCF::getObjectPool();
// Enable caching for std::string.
// * Don't cache more than 10 std::string objects.
// * Call std::string::clear() before putting a string into the cache.
auto clearString = [](std::string * pStr) { pStr->clear(); };
cache.enableCaching<std::string>(10, clearString);
std::string s1 = "123456789012345678901234567890";
std::string s2;
RcfClient<I_Echo> client(( RCF::TcpEndpoint(port) ));
// First call.
s2 = client.Echo(s1);
// Subsequent calls - no memory allocations at all, in RCF runtime, or
// in std::string serialization/deserialization, on client or server.
for (std::size_t i=0; i<100; ++i)
{
s2 = client.Echo(s1);
}
// Disable caching for std::string.
cache.disableCaching<std::string>();

In this example, the first call to Echo() will cause several server-side deserialization-related memory allocations - one to construct a std::string, and another to expand the internal buffer of the string, to fit the incoming data.

With object caching enabled, after the call returns, the server-side string is cleared and then held in the object cache, rather than being destroyed. On the next call, instead of constructing a new std::string, RCF reuses the std::string in the cache. Upon deserialization, std::string::resize() is called, to fit the incoming data. As this particular string object has already held the requested amount of data earlier, the resize() request does not result in any memory allocation.

The object cache is configured on a per-type basis, using the RCF::ObjectPool::enableCaching<>() and RCF::ObjectPool::disableCaching<>() functions. For each cached datatype, you can specify the maximum number of objects to cache, and which function to call, to put the objects in a reusable state.

The object cache can be used with any C++ type, not just the types that appear in a RCF interface. If your server-side code repeatedly creates and destroys objects of a particular type, you can enable object caching for that type.

Scalability

RCF is built to scale as far as the underlying hardware and operating system allow it.

Scalability is usually of more concern on the server-side than client-side, as servers tend to manage far more network connections than any individual client would.

RCF's server transport implementation is based on Asio, a well-known C++ networking library which has been part of the Boost library for many years, and will likely form the foundation for a future networking libary in the C++ standard. Asio is a mature and high-performance networking back-end, which leverages native network API's when they are avaialable (I/O completion ports on Windows, epoll() on Linux, /dev/poll on Solaris, kqueue() on FreeBSD), and less performant API's when they aren't (BSD sockets).

As such, the number of clients a RCF server can support is essentially determined by the number of cores and amount of memory available to the server, as well as the application-specific resources required by each client.

RCF has been designed to yield minimal performance overhead, with network intensive, high throughput applications in mind. Keep in mind that bottlenecks in distributed systems tend to be determined by the overall design of the distributed system - a poorly designed distributed system will have its performance cut off well before the communications layer reaches its limit.