cassandra，文档档型数据库可以看

和通数据库htsjk.Com2023-03-25 13:04 来源:未知阅读:10541 评论 20 热度2

标签：数据库 Database NoSQL

cassandra，文档档型数据库可以看

1.介绍

2.端口和配置文件

3.基本概念

3.1数据模型

3.2 数据类型

4.命令

5.CQL

6.普通索引

7.新增数据

8.查询数据

9.更新

10.java 客户端

10.1 datastax的java-driver

11.spring data cassandra

12.集群搭建

13.数据存储格式

14.重要知识点

1.介绍

列存储数据库:

分布式存储的海量数据，键仍然存在，但是它们的特点是指向了多个列;

文档型数据库:

数据模型是版本化的文档，半结构的文档以特定的格式存储，比如:json。文档档型数据库可以看作是键值数据库的升级版

官网:cassandra.apache.org

使用场景:

数据写入操作密集、数据修改操作很少、通过主键查询、需要对数据进行分区存储

场景类型:

存储日志型数据库、类似物联网的海量数据、对数据进行追踪

配置环境变量CANSSANDRA_HOME；

cassandra的数据分为3类，这三类数据的存储位置都可以在配置文件中修改

data目录:用于存储真正的数据文件。如果服务器有多个磁盘，可以指定多个目录，每一个目录都在不同的磁盘中。这样cassandra就可以利用更多的硬盘空间。

commitlog:用于存储未写入sstable中的数据，每次cassandra系统中有数据写入，都会先将数据记录在该日志文件中，保证在任何情况下宕机都不会丢失数据

cache:用于存储系统中的缓存数据

配置:

conf目录下的cassandra.yml，配置data_file_directories

配置commitlog目录

配置saved_caches目录

cassandra客户端连接cassandra服务器，需要先安装python2.x

bin目录cqlsh.bat xx.xx.xx.xx 9042

linux配置: 同 windows 建目录，改配置

linux 不建议root 用户启动，但仍想启动加 -R

noodtool status 查看状态

2.端口和配置文件

7199 -JMX

7000 节点间通信（如果启用了TLS,则不使用）

7001 TLS节点间通信(使用TLS时使用)

9160 Thrift 客户端api

9042 CQL本地传输端口

3.基本概念

3.1数据模型

列

是cassandra的最基本数据结构单元，具有三个值:名称，值，时间戳

列族

相当于关系数据库中的表，包含了多行row

keyspace相当于数据库，创建的时候可以指定一些属性:副本因子、副本策略、duralbe_writes （是否启用commitlog机制）；

副本因子决定数据有几分，例如:所有的副本同等重要，没有主从之分；副本策略设置应大于1，但是不能超过集群中的节点数。

3.2 数据类型

数值类型: varint，相当于java BigIntger

文本类型:ascill、text、varchar

时间类型:timestamp、date、time

标识符类型:uuid 128位数据类型、timeuuid

集合类型:set、list、map

list包含了有序的列表数据

集合类型注意:集合的每一项最大是64k，保持集合内部的数据不要太大，免得查询延时过长

其他类型:boolean、blob、inet、counter

用户自定义数据类型

4.命令

help 帮助命令

capture:捕获命令的输出并将其添加到文件

consistency:显示当前一致性级别，或设置一致性

copy:将数据复制到cassandra并从cassandra复制数据

describe:描述当前集群

expand:纵向扩展查询的输出

exit:退出

paging:启用或禁用查询分页

show:显示当前cqlsh的会话详细信息

source:执行包含cql语句的文件

tracing:启用或禁用请求跟踪

5.CQL

create keyspace

use

alter keyspace

drop keyspace

create table

alter table

drop table

truncate

create index

drop index

数据操作指令:

insert

update

delete

batch

查询命令:

select

where

order by

create keyspace keyspaceName with replication={'class':strategy_name,'replication_factor':2}

strategy_name：代表副本放置策略，内容包括:简单策略、网络拓扑策略；

查看所有的keyspace: describe keyspaces;

连接keyspace: use keyspace;

创建表:

create table student(

id int primary key,

name text,

age int,

gender tinyint,

address text,

interest set<text>,

phone list<text>,

education map<text,text>

);

describe table student;

不要在以下情况使用索引:

这列的值很多的情况下，因为你相当于查询了很多条记录，得到一个很小的结果。

表中有counter类型的列

频繁更新和删除的列

在一个很大的分区中去查询一条记录的时候

cassandra的5种key:

primary key 主键

Partition key 区分key

composite key 复合key

compound key 复合key

clustering key 集群

composite primary key:

create table testtab(key_one int,

key_two int,

name text,

primary key(key_one,key_two)

);

partion key:

在组合主键的情况下，第一部分称作partion key，第二部分clusting key；

cassandra会对partion key 做一个hash计算，并自己决定这条记录放在哪个点。

create table testTab(

key_part_one int,

kay_part_two int,

key_clust_one int,

key_clust_two int,

key_clust_three uuid,

name text,

primary key(key_part_one,key_part_two),key_clust_one,key_clust_two,key_cluster_three)

);

clustering key:

决定同一个分区内相同partion key数据的排序，默认为升序

6.普通索引

create index 名称 on tableName(字段) 不指定索引名字的时候，会自定义索引名字

索引原理:cassandra之中的索引的实现相对mysql的索引来说就要简单粗暴多了，cassandra自动新创建了一张表格，同时将表格之中的索引字段作为新索引表的primary key，并且存储的值为原始数据的primary key。

集合创建索引:

create index on student(interest); --set集合创建索引

create index mymap on student(keys(education)) -- map集合创建索引

7.新增数据

添加语句与sql 一样

insert into student (id,address,age,gender,name,interest,phone,education) values(1011,'中山',16,1,'tom',{'游泳','跑步'},['11111','2222'],{'小学':'xxx小学','中学':'xxx中学'});

添加ttl，设置过期时间;

insert into student (id,address,age,gender,name,interest,phone,education) values(1011,'中山',16,1,'tom',{'游泳','跑步'},['11111','2222'],{'小学':'xxx小学','中学':'xxx中学'}) using ttl 60;

8.查询数据

select * from student where id=1002;

查询时使用索引:

对查询时使用索引有一定的要求:

Primary key只能使用=查询；

第二主键支持=、>、<、>=、<=;

索引列只支持=；

非主键非索引过滤，可以使用allow filtering；

不要单独对第二主键作为条件查询，如果确实要这样做加 allow filtering，不推荐，搜索的数据量太大了；

create table testTb(

key_one int,

key_two int,

name text,

age int,

primary key(key_one,key_two)

);

create index idxb_age on testTb(age);

集合列：

假设已经给集合添加了索引，就可以使用where字句的contains条件按照给定的值进行过滤

select * from student where interest contanins '电影';

select * from studnet where education contains key '小学' ；--查询map集合的key

select * from student where education contains 'xx中学' allow filtering ; -- 查询map的value值

allow fitering 是一种非常消耗资源的查询方式。

如果表中有100w行，并且其中95%具有满足查询条件的值，则查询仍然相对有效，这时应该使用allow filtering。

如果表包含100w行，并且只有2行包含满足查询条件的值，则查询效率极低，cassandar将无需加载999，998行。如果经常使用查询，则最好在列上添加索引。

allow fitering 在表数据量小的时候没有问题，但是数据量过大就会查询变得缓慢。

查询时排序:

cassandra也是支持排序的，order by。排序也是有条件的

cassandra的第一主键决定记录分布在哪台机器上，cassandra只支持单台机器上的记录排序；

只能根据第二、三、四...主键进行有序的，相同的排序；

不能有索引查询：

cassandra的任何查询，最后的结果都是有序的，内部就是这样存储的。

select * from test where key_one=12 order by key_two; -- 正确

select * from test where key_one=12 and age =19 order by key_two; -- 错误，不能有索引查询

索引列支持like；

主键支持group by；

分页 limit；

9.更新

更新简单类型的和mysql一样；

更新set类型数据:

update student set interest = interest + {'游戏'} where student_id = 1012;

update student set interest = interest - {'电影'} where student_id = 1012;

update student set interest = {} where student_id = 1012;

一般来说，set、list、map要求最少一个元素

更新list类型数据:

update student set phone=['2222','222'] where student_id=1012;

update student set phone=['2222','222'] +phone where student_id=1012;

使用列表索引设置值，覆盖已经存在的值:

update student set phone[2]=['444'] where student_id=1012;

这种更新，如果集合中数据多了，更新会非常慢

不推荐:使用delete命令和索引删除某个特点位置的值，非线程安全的；

推荐使用- 或+

更新map类型的数据:

update student set education={'中学':'xxx','小学':'ddd'} where student_id =1001;

update student set education['中学']='爱慕中学' where student_id =1001;

update student set education=education+{'幼儿园':'xxx'} where student_id =1001;

delete education['幼儿园'] from student where student_id=1012;

update student set education=education - {'中学','小学'} where student_id =1001;

删除操作和mysql语法一样

批量操作:

begin batch

insert

update

delete

apply batch;

10.java 客户端

10.1 datastax的java-driver

pom: cassandra-driver-core、cassandra-driver-mapping

代码:

获得连接:Cluster cluster =Cluster.builder.addContactPoint().withPort().build();

Session session = cluster.connect();

预编译:Prepared statements

cassandra提供了类似jdbc使用预编译占位符

11.spring data cassandra

pom：spring -data-cassandra

12.集群搭建

种子节点:

一个新节点加入集群时，需要通过种子节点来发现集群中其他节点，需要至少一个活跃的种子节点可以连接，一旦节点加入这个集群，知道了集群中的其他节点，这个节点在下次启动的时候就不需要种子节点了。对于种子节点没有特殊要求，可以设置任何一个节点为种子。

配置：

cassandra.yml

cluster_name集群名字，每个节点要一样

seeds填写2个节点的ip作为种子节点 "192.168.155.12,192.168.155.13"

listen_address填写当前节点所在机器的ip地址

rpc_address填写当前节点所在机器的ip地址

noodtool status 查看集群节点

13.数据存储格式

cassandra的数据包括在内存中和磁盘中的数据

这些数据主要分为三种:

commitLog：主要记录客户端提交过来的数据以及操作，这种数据被持久化到磁盘中，方便数据没有被持久化到磁盘时可以用来恢复。

Memtable:用户写的数据在内存中的形式，每一个columnFamily对应一个memtable，即每一张表对应一个

SStable:数据被持久化到磁盘，又分为data、index、filter三种数据格式

一个columnfamily会对应多个SStable,当用户检索数据时，cassandra使用了bloom filter，通过多个hash行数将key映射到一个位图中，来快速判断这个key属于哪个SSTable

14.重要知识点

集群中每一台机器都是对等的，不存在主从节点的分区，集群中任何一台机器出现故障，整个集群系统不会受到影响。

一致性hash是cassandra搭建集群的基础，一致性hash可以降低分布式系统中，数据重新分布的影响。

本站文章为和通数据库网友分享或者投稿，欢迎任何形式的转载，但请务必注明出处.
同时文章内容如有侵犯了您的权益，请联系QQ：970679559，我们会在尽快处理。

返回首页

评论暂时关闭