Cassandra cqlsh client, OperationTimedOut and request timeouts
I’m used to mysql command line client. If a query runs for a long time, it just keeps running. This is very useful if you feed a script (i.e. a sequence of queries) to mysql client for execution: no matter how long each query executes, the queries are run serially. If nothing breaks in the middle, they all execute successfully.
It turned out that with the default settings Cassandra’s cqlsh
(command line client) behaves differently.
All of a sudden, my script (a sequence of DDL queries run in the beginning of an integration test to prepare
database) has failed. The first error was OperationTimedOut, but the following ones were caused by the fact
that the first query did not yet finish. For example, in my case the first query was
DROP KEYSPACE
, while the second was CREATE KEYSPACE
with the same name. Of course, it failed, and
the following CREATE TABLE
queries failed as well.
Why does this happen? Because cqlsh
has a limit (by default it is 10 seconds, according to
documentation). If your query runs
more than this limit, the client just fails with OperationTimedOut error message, but the query is still
running on the server.
OK, how do we disable this limit, or at least configure it to be long enough?
Good news: cqlsh
in Cassandra 2.1.16 has --request-timeout
command line parameter and you can specify
the limit there (in seconds). --request-timeout 3600
would be a good start.
Bad news: cqlsh
in Cassandra 2.1.12 does NOT have that parameter yet, so this parameter is not that universal.
By the way, version reported by cqlsh (with the usual
--version
) is strange. I tried it with cqlsh included in Cassandra distribution for Cassandra 2.1.8, 2.1.12, 2.1.16, and in all these cases the version was reported as 5.0.1, even though 2.1.16 reports support for--request-timeout
(and really supports it) and the other two versions don’t.
But let’s return to our limit.
Good news: ~/.cassandra/cqlshrc
file allows to define this timeout in [connection] section.
Bad news: the documentation is not accurate. Although it says that the option was added in version 2.1.1 and
is called request_timeout
, and this is true for 2.1.16, it is NOT true for 2.1.12. In it, you have to call
the option client_timeout
. Moreover: in 2.1.12, according to
this article,
you could completely disable the timeout by assigning None
. Alas, in 2.1.16
(with request_timeout
) this does not work.
It is not possible to (reliably) completely disable the timeout. If you set
request_timeout
to 0, this will mean that any request will timeout. Negative values cause errors.
So the only option is to set it to some large value (like the abovementioned 3600 seconds).
So, a kinda universal way to make sure your integration tests don’t stumble upon this, is to put the following
in your ~/.cassandra/cqlshrc
:
[connection]
request_timeout = 3600
client_timeout = 3600
BTW, how come that DROP KEYSPACE
for a keyspace with a few tables with no data in them where the cluster
contains just one node could not fit into the default timeout (presumably 10 seconds) on a machine with
a decent HDD which was not overloaded? It’s a different story…