简体   繁体   English

将宽行从Cassandra加载到C#的最快方法是什么?

[英]What is the fastest way to load a wide row from Cassandra to C#?

What is the most performance efficient way to load a single (or a few) wide rows from Cassandra to C#? 将单行(或几行)宽行从Cassandra加载到C#的最高效的方法是什么? My wide rows have 10.000-100.000 columns. 我的宽行有10.000-100.000列。 The primary keys consists of several values but the column key is a single string and the column value is a single counter (see the schema below). 主键由几个值组成,但列键是单个字符串,列值是单个计数器(请参见下面的架构)。

Using "tracing on" in the cqlsh I can see that Cassandra can select a wide row with 17.000 columns in 44 m, but loading this data all the way into C# using the Datastax driver takes 700 ms. 使用cqlsh中的“ tracing on”,我可以看到Cassandra可以选择44 m中具有17.000列的宽行,但是使用Datastax驱动程序将该数据一直加载到C#中需要700毫秒。 Is there a faster way? 有没有更快的方法? I need to load the full wide row in 50-100ms. 我需要在50-100毫秒内加载完整的宽行。 (Is there a more native way? A way minimizing the network traffic? A faster driver? Another configuration of the driver? Or something else?) (是否有更本机的方法?使网络流量最小化的方法?更快的驱动程序?驱动程序的其他配置?还是其他?)

I actually do not need all 17.000 columns. 我实际上不需要所有17.000列。 I just need the columns where 'support' >= 2 or the top 1000 columns sorted descending by 'support'. 我只需要“支持”> = 2或“支持”降序排列的前1000列。 But since 'support' is my column value I don't know of any way to query like this in CQL. 但是由于'support'是我的列值,所以我不知道在CQL中以这种方式进行查询。

This is my table: 这是我的桌子:

CREATE TABLE real_time.grouped_feature_support (
    algorithm_id int,
    group_by_feature_id int,
    select_feature_id int,
    group_by_feature_value text,
    select_feature_value text,
    support counter,
    PRIMARY KEY ((algorithm_id, group_by_feature_id, select_feature_id, group_by_feature_value), select_feature_value)

This is my way to access the data using the Datastax driver: 这是我使用Datastax驱动程序访问数据的方式:

var table = session.GetTable<GroupedFeatureSupportDataEntry>();
var query = table.Where(x => x.CustomerAlgorithmId == customerAlgorithmId
    && x.GroupByFeatureId == groupedFeatureId
    && myGroupedFeatureValues.Contains(x.GroupByFeatureValue)
    && x.GroupByFeatureValue == groupedFeatureValue
    && x.SelectFeatureId == selectFeatureId)
    .Select(x => new
    {
        x.GroupByFeatureValue,
        x.SelectFeatureValue,
        x.Support,
    })
    .Take(1000000);
var result = query.Execute();

If you are looking for the best performance when retrieving a large result set you should not use a mapping component like Linq-to-cql or any other. 如果要在检索大型结果集时寻求最佳性能,则不应使用Linq-to-cql等映射组件。

You can retrieve the rows using the technique documented on the driver readme , in your case it would be something like: 您可以使用驱动程序自述文件中记录技术来检索行,在这种情况下,它将类似于:

var query = "SELECT * from grouped_feature_support WHERE" + 
            " algorithm_id = ? AND group_by_feature_id = ? " +
            " AND select_feature_id = ? AND group_by_feature_value = ?";
//Prepare the query once in your application lifetime
var ps = session.Prepare(query);
//Reuse the prepared statement by binding different parameters to it
var rs = session.Execute(ps.Bind(parameters));
foreach (var row in rs)
{
  //The enumerator will yield all the rows from Cassandra
  //Retrieving them in the back in blocks of 5000 (determined by the pagesize).
}
//You can also use a IEnumerable<T> Linq Extensions to filter
var filteredRows = rs.Where(r => r.GetValue<long>("support") > 2);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用C#将XML文件加载到MySQL的最快方法是什么? - What is the fastest way to load an XML file into MySQL using C#? 从 C# 中的字符串中删除换行符的最快方法是什么? - What would be the fastest way to remove Newlines from a String in C#? 从C#在SQL Server中插入记录的最快方法是什么 - What is the fastest way to insert record in SQL Server from C# 从C#app到CRUD Cassandra集群的最佳方法是什么? - What is the best way to CRUD Cassandra cluster from C# app 在 C# 中离散化双精度的最快方法是什么? - What is the fastest way to discretize double in C#? C#中修改像素最快的方法是什么 - What is the fastest way to modify pixels in C# 什么图像格式可以c#加载最快? - What image format can c# load fastest? 从数据库SQLite C#服务堆栈读取SQL数据(数百万条记录)的最快方法是什么 - What is the fastest way to read the SQL Data (Millions of records) from database SQLite C# Service Stack 从 C# 数组计算最小值、最大值、平均值、中值和标准差的最快方法是什么? - What is the fastest way to calculate min, max, mean, median and standard deviation from C# array? 在 C# 中将值和键从一个字典复制到另一个字典的最快方法是什么? - What's the fastest way to copy the values and keys from one dictionary into another in C#?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM