You can use TEXT column in your database ( about 64KB characters).
As you know String in java has 2^31-1 characters
I'm afraid collections are not designed for this use-case (read more). Is there any reason you can't use a clustering key instead of a list?
CREATE TABLE message_history (
user_name text,
time timestamp,
message_details text,
PRIMARY KEY(user_name, time)
) WITH CLUSTERING ORDER BY (time DESC);
insert into message_history (user_name, time, message_details) values ('user1', dateOf(now()), 'message text');
insert into message_history (user_name, time, message_details) values ('user1', dateOf(now()), 'message text2');
insert into message_history (user_name, time, message_details) values ('user1', dateOf(now()), 'message text3');
select * from message_history where user_name = 'user1' limit 1;
user_name | time | message_details
-----------+--------------------------+-----------------
user1 | 2015-08-13 15:44:45+0000 | message text3
I think, from a pragmatic point of view, that it is wise to get a back-of-the-envelope estimate of worst case using the formulae in the ds220 course up-front at design time. The effect of compression often varies depending on algorithms and patterns in the data. From ds220 and http://cassandra.apache.org/doc/latest/cql/types.html:
uuid: 16 bytes
timeuuid: 16 bytes
timestamp: 8 bytes
bigint: 8 bytes
counter: 8 bytes
double: 8 bytes
time: 8 bytes
inet: 4 bytes (IPv4) or 16 bytes (IPV6)
date: 4 bytes
float: 4 bytes
int 4 bytes
smallint: 2 bytes
tinyint: 1 byte
boolean: 1 byte (hopefully.. no source for this)
ascii: equires an estimate of average # chars * 1 byte/char
text/varchar: requires an estimate of average # chars * (avg. # bytes/char for language)
map/list/set/blob: an estimate
hope it helps
No. A Bigint is just a 64-bit signed long. No size or space limits can be specified. https://cassandra.apache.org/doc/latest/cql/types.html
As was pointed in the comment, it's easy to add the user-defined function, and use it to retrieve the length of the text field, but the catch is that you can't use the user-defined function in the WHERE condition (see CASSANDRA-8488).
Even if it was possible, if you only have this as condition - that's a bad query for Cassandra, as it will need to go through all data in the database, and filter them out. For such tasks, usually things like, Spark are used - you can read data via Spark Cassandra Connector, and apply necessary filtering conditions. But this will involve reading all of the data from database, and then performing the filtering - this would be quite slower than normal CQL queries, but at least automatically parallelized.