手机扫码查看
2020java微服务架构五之ES全文搜索引擎教程
一、ElasticSearch 介绍
1.1 引言
1.在海量数据中执行搜索功能时,如果使用mysql,效率太低
2.如果关键字输入的不准确,一样可以搜索到想要的数据。
3.将搜索关键字,以红色的字体展示。
1.2 ES的介绍
ES是一个使用java语言并且基于Lucene编写的所有引擎框架,提供分布式的全文搜索功能,提供了一个统一的基于RESTful风格的web接口,官方客户端也对多种语言都提供了相应的API
Lucene:本身就是一个搜索引擎的底层。
分布式:ES主要是为了突出它的横向扩展能力
全文检索:将一段词语进行分词,并且将分出的单个词语统一放到一个分词库中,在检索时,根据关键字去分词库中检索,找到匹配的内容。(倒排索引)
RESTful风格的web接口:操作ES很简单,只需要发送一个HTTP请求,并且根据请求方式的不同,携带参数的不同,执行相应的功能。
应用广泛:github、wiki、gold man 用ES每天维护将近10TB的数据。
1.3 ES的由来

1.4 ES和Solr
1. Solr在查询死数据时,速度相对ES更快一些,但是数据如果是实时改变的,Solr的查询速度会降低很多,ES 的查询的效率基本没有变化
2. Solr搭建基于需要依赖 Zookeeper 来帮助管理。ES本身就支持集群的搭建,不需要第三方的介入。
3. 最开始Solr的社区可以说是非常火爆的,针对国内的文档并不是很多。在ES出现之后,ES的社区火爆程度直线上升,ES的文档非常健全。
4. ES对现在云计算和大数据支持的特别好。
1.5 倒排索引
将存放的数据,以一定的方式进行分词,并且将分词的内容放到一个单独的分词库中。
当用户查询数据时,会将用户的查询关键字进行分词。
然后去分词库中匹配内容,最终得到数据的id标识。
根据id标识去存放数据的位置拉取到指定的数据。

二、ES 安装
2.1 安装ES
1.在 opt 目录下创建 docker_es文件夹
2.进入 docker_es 文件夹 创建 docker-compose.yml
3.启动 docker-compose
4. 查看日志 docker -compose logs -f
准备 docker-compose.yml
version: '3.1'
services:
elasticsearch:
restart: always
image: daocloud.io/library/elasticsearch:6.5.4
container_name: elasticsearch
ports:
- 9200:9200
kibana:
restart: always
image: daocloud.io/library/kibana:6.5.4
container_name: kibana
ports:
- 5601:5601
environment:
- elasticsearch_url=http://192.168.2.123:9200
depends_on:
- elasticsearch
如果启动失败则在 /etc/sysctl.conf文件最后添加一行
vm.max_map_count=262144
然后重启服务即可

然后访问 Kibana

2.2 安装ik分词器
进入ES容器内部,进入bin目录,然后执行:
./elasticsearch-plugin install http://tomcat01.qfjava.cn:81/elasticsearch-analysis-ik-6.5.4.zip
再然后输入 y 继续安装

之后重启ES容器,然后打印启动日志之后进入Kibana控制台–>Dev Tools
POST _analyze
{
“analyzer”: “ik_max_word”,
“text”: “舒大少博客”
}


三、ElasticSearch基本操作
3.1 ES的结构
3.1.1 索引index,分片和备份
es的服务中,可以创建多个索引。
每一个索引默认被分成5片存储
每一个分片都会存在至少一个备份分片。
备份分片默认不会帮助检索数据,当ES检索压力过大的时候,备份分片才会帮助检索数据。
备份的分片必须放在不同的服务器中。

3.1.2 类型 Type
一个索引下,可以创建多个类型
PS:根据版本不同,类型的创建也不同。

3.1.3 文档 Doc
一个类型下,可以有多个文档,这个文档就类似于mysql表中的多行数据。

3.1.4 属性 field
一个文档中,可以包含多个属性,类似于mysql表中的一行数据存在多个列

3.2 操作 ES 的RESTful 语法
GET请求:
http://ip:port/index :查询索引信息
http://ip:port/index/type/doc_id :查询指定的文档信息
POST 请求:
http://ip:port/index/type/_search :查询文档,可以在请求体中添加json字符串来代表查询条件
http://ip:port/index/type/doc_id/_update :修改文档,在请求体中添加json字符串代表修改的具体信息
PUT 请求:
http://ip:port/index :创建一个索引,需要在请求体中指定索引的信息,类型,结构
http://ip:port/index/type/_mappings :代表创建索引时,指定索引文档类型存储的属性的信息
DELETE 请求:
http://ip:port/index :删除跑路
http://ip:port/index/type/doc_id :删除指定的文档
3.3 索引的操作
3.3.1 创建一个索引
# 创建一个索引
PUT /person
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 5
}
}

3.3.2 查看索引信息
# 查看索引
GET /person


3.3.3 删除索引
# 删除索引
DELETE /person

3.4 ES中field可以指定的类型
1.字符串类型
text:一般被用于全文检索,将当前field进行分词
keyword:当前field不会被分词
2.数值类型:
long :占用8个字节
integer :占用4个字节
short :占用2个字节
byte :占用1个字节
double : 占用8个字节
float : 占用4个字节
half_float :精度比float小一半
scaled_float :根据一个long和scaled来表达一个浮点型,long-345,scaled-100 -> 3.45
3.时间类型:data类型,针对时间类型指定具体的格式
4.布尔类型: boolean类型,表达true或false
5.二进制类型:binary类型暂时支持 base64 encode string
6.范围类型:
赋值时,无需指定具体的内容,只需要存储一个范围即可,指定 gt,lt,gte,lte
long_range、integer_range、double_range、float_range、date_range、ip_range
7. 经纬度类型
geo_point:用来存储经纬度
8. ip类型:ip:可以存储IPV4或IPV6
其他的数据类型请参考:官网
3.5 创建索引并指定数据结构
# 创建索引,指定数据结构
PUT /book
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 5
},
"mappings": {
"novel": {
"properties": {
"name":{
"type": "text",
"analyzer": "ik_max_word",
"index": true,
"store": false
},
"author": {
"type": "keyword"
},
"count": {
"type": "long"
},
"on-sale": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"desc": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
}


3.6 文档的操作
3.6.1 添加文档
自动生成id
# 添加文档,自动生成id
POST /book/novel
{
"name": "盘龙",
"author": "番茄",
"count": "100000",
"on-sale": "2020-01-01",
"desc": "盘龙小说"
}

手动指定 id
# 添加文档,手动指定id
POST /book/novel/1
{
"name": "红楼梦",
"author": "曹雪芹",
"count": "50000",
"on-sale": "1988-03-03",
"desc": "红楼梦小说"
}

3.6.2 修改文档
覆盖式修改
# 修改文档,手动指定id
POST /book/novel/1
{
"name": "红楼梦",
"author": "曹雪芹",
"count": "50000",
"on-sale": "1988-03-03",
"desc": "红楼梦小说"
}
doc方式修改
# 修改文档,基于doc方式
POST /book/novel/1/_update
{
"doc": {
"author": "舒大少"
}
}

3.6.3 删除文档
# 删除文档 根据id删除
DELETE /book/novel/_id
四、java操作ES
4.1 java连接ES
1.创建maven项目,导入依赖,Junit、lombok
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>6.5.4</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>6.5.4</version>
</dependency>
2. 创建测试类,连接ES
创建连接
public class ESClient {
public static RestHighLevelClient getClient(){
//1. 创建 HttpHost 对象
HttpHost host=new HttpHost("192.168.2.123",9200);
//2. 创建 RestClientBuilder
RestClientBuilder rb= RestClient.builder(host);
//3. 创建 RestHighLevelClient 对象
return new RestHighLevelClient(rb);
}
}
测试连接
public class Demo {
@Test
public void testConnect(){
RestHighLevelClient client = ESClient.getClient();
System.out.println("OK!");
}
}
4.2 java 操作索引
4.2.1 java 创建索引
public class Demo2 {
RestHighLevelClient client = ESClient.getClient();
// number_of_replicas=1;number_of_shards=5;
@Test
public void createIndex() throws IOException {
String index="person2";
String type="man";
//1. 准备索引的 settings
Settings.Builder settings = Settings.builder().
put("number_of_replicas", 1)
.put("number_of_shards",3);
//2. 准备索引的结构 mappings
XContentBuilder mappings = JsonXContent.contentBuilder()
.startObject()
.startObject("properties")
.startObject("name")
.field("type", "text")
.endObject()
.startObject("age")
.field("type", "integer")
.endObject()
.startObject("birthday")
.field("type", "date")
.field("format", "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis")
.endObject()
.endObject()
.endObject();
// 3. 将 settings 和 mappings 封装到一个Request对象
CreateIndexRequest request = new CreateIndexRequest(index)
.settings(settings)
.mapping(type,mappings);
//4. 通过 client 对象去连接 ES 并执行创建索引
CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
// 5. 输出
System.out.println("response:"+response.toString());
}
}
4.2.2 检查索引是否存在
@Test
public void isExists() throws IOException {
RestHighLevelClient client = ESClient.getClient();
// 1. 准备 request 对象
GetIndexRequest request = new GetIndexRequest();
request.indices("person");
// 2. 通过 client 操作
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}
4.2.3 删除索引
@Test
public void del() throws IOException {
RestHighLevelClient client = ESClient.getClient();
DeleteIndexRequest request=new DeleteIndexRequest();
request.indices("person");
AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
System.out.println(delete.isAcknowledged());
}
4.3 java 操作文档
4.3.1 添加文档
public class Demo4 {
ObjectMapper mapper=new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
@Test
public void createDoc() throws Exception {
// 1.准备 json 数据
Person person=new Person(1,"张三",23,new Date());
String json = mapper.writeValueAsString(person);
// 2. 准备一个 request 对象 (手动指定id)
IndexRequest request=new IndexRequest("person","man",person.getId().toString());
request.source(json, XContentType.JSON);
// 3. 通过 client 对象执行添加
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
System.out.println(response.getResult().toString());
}
}
4.3.2 修改文档
@Test
public void updateDoc() throws IOException {
RestHighLevelClient client = ESClient.getClient();
//1. 创建Map集合,指定需要修改的内容
Map<String,Object> doc=new HashMap<>();
doc.put("name","李四");
String docId="1";
// 2. 创建 request 对象,封装数据
UpdateRequest request=new UpdateRequest("person","man",docId);
request.doc(doc);
// 3. 通过 client 对象执行
UpdateResponse update = client.update(request, RequestOptions.DEFAULT);
// 4. 输出结果
System.out.println(update.getResult().toString());
}
4.3.3 删除文档
@Test
public void deleteDoc() throws IOException {
RestHighLevelClient client = ESClient.getClient();
DeleteRequest request=new DeleteRequest("person","man","1");
DeleteResponse delete = client.delete(request, RequestOptions.DEFAULT);
System.out.println(delete.getResult().toString());
}
4.4 java批量操作文档
4.4.1 批量添加
@Test
public void bulkCreateDoc() throws Exception{
ObjectMapper mapper=new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index="person";
String type="man";
// 1. 准备 json 数据
Person p1=new Person(1,"张三",23,new Date());
Person p2=new Person(2,"李四",24,new Date());
Person p3=new Person(3,"王五",25,new Date());
String json1 = mapper.writeValueAsString(p1);
String json2 = mapper.writeValueAsString(p2);
String json3 = mapper.writeValueAsString(p3);
// 2. 创建 request 将准备好的数据封装进去
BulkRequest request=new BulkRequest();
request.add(new IndexRequest(index,type,p1.getId().toString()).source(json1, XContentType.JSON));
request.add(new IndexRequest(index,type,p2.getId().toString()).source(json2, XContentType.JSON));
request.add(new IndexRequest(index,type,p3.getId().toString()).source(json3, XContentType.JSON));
// 3. 用 client 执行
BulkResponse response = client.bulk(request, RequestOptions.DEFAULT);
// 4. 输出结果
System.out.println(response.toString());
}
4.4.2 批量删除
@Test
public void bulkDeleteDoc() throws Exception{
ObjectMapper mapper=new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index="person";
String type="man";
// 1.创建 request 对象
BulkRequest request=new BulkRequest();
request.add(new DeleteRequest(index,type,"1"));
request.add(new DeleteRequest(index,type,"2"));
request.add(new DeleteRequest(index,type,"3"));
// 2. client 执行
BulkResponse response = client.bulk(request, RequestOptions.DEFAULT);
// 3. 输出
System.out.println(response.toString());
}
五、ElasticSearch 的各种查询
5.1 term&terms查询
5.1.1 term 查询
term的查询是代表完全匹配,搜索之前不会对你搜索的关键字进行分词,对你的关键字去文档分词库中去匹配内容。
terms是在针对一个字段包含多个值的时候使用。
term:where province = 北京;
terms:where province = 北京 or province = ? or province = ?
# term查询
POST /sms-logs-index/sms-logs-type/_search
{
"from":0, # limit ?
"size": 5,# limit x,?
"query": {
"term": {
"province": { # 字段名
"value": "北京"
}
}
}
}
java 查询方式
@Test
public void term() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 request 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.from(0);
builder.size(5);
builder.query(QueryBuilders.termQuery("province","北京"));
request.source(builder);
// 3. 执行查询
SearchResponse response=client.search(request, RequestOptions.DEFAULT);
// 4. 获取到 _source 中的数据并展示
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
5.1.2 terms查询
# terms 查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"terms": {
"province": [
"北京",
"上海"
]
}
}
}
java实现查询方式
@Test
public void terms() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 request 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.termsQuery("province","北京","上海"));
request.source(builder);
// 3. 执行查询
SearchResponse response=client.search(request, RequestOptions.DEFAULT);
// 4. 获取到 _source 中的数据并展示
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
5.2 match 查询【重点】
match查询属于高层查询,他会根据你查询的字段类型不一样,采用不同的查询方式。
查询的是日期或者是数值的话,他会将你基于的字符串查询内容转换为日期或者数值对待。
如果查询的内容是一个不能被分词的内容(keyword),match查询不会对你指定的查询关键字进行分词。
如果查询的内容时一个可以被分词的内容(text),match会将你指定的查询内容根据一定的方式去分词,去分词库中匹配指定的内容。
match查询,实际底层就是多个term查询,将多个term查询的结果给你封装到了一起。
5.2.1 match_all 查询
查询全部内容,不指定任何查询条件
# match_all 查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match_all": {}
}
}
java 方式查询
@Test
public void match_all() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 request 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.matchAllQuery());
builder.size(20); //ES 默认只查询10条,如想查更多,指定size
request.source(builder);
// 3. 执行查询
SearchResponse response=client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println("总共有"+response.getHits().getHits().length+"条数据");
}
5.2.2 match 查询
指定一个field 作为筛选的条件
# match查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match": {
"smsContent": "收货安装"
}
}
}
java代码实现
@Test
public void matchQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 request 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.matchQuery("smsContent","收货安装"));
request.source(builder);
// 3. 执行查询
SearchResponse response=client.search(request,RequestOptions.DEFAULT);
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println("总共有"+response.getHits().getHits().length+"条数据");
}
5.2.3 布尔 match查询
基于一个 field 匹配的内容,采用 and或者 or的方式连接
# 布尔 match 查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match": {
"smsContent": {
"query": "中国 健康",
"operator": "or"
}
}
}
}
java 方式查询
@Test
public void booleanMatchQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 request 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.matchQuery("smsContent","中国 健康").operator(Operator.OR));
request.source(builder);
// 3. 执行查询
SearchResponse response=client.search(request,RequestOptions.DEFAULT);
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println("总共有"+response.getHits().getHits().length+"条数据");
}
5.2.4 multi_match查询
match针对一个field做检索,multi_match针对多个 field 进行检索,多个 field 对应一个text
# multi_match 查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"multi_match": {
"query": "北京",
"fields": ["province","smsContent"]
}
}
}
java 代码实现
@Test
public void multiMatchQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 request 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.multiMatchQuery("北京","province","smsContent"));
request.source(builder);
// 3. 执行查询
SearchResponse response=client.search(request,RequestOptions.DEFAULT);
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println("总共有"+response.getHits().getHits().length+"条数据");
}
5.3 其他查询
5.3.1 id查询
根据id查询,类似于mysql中 where id=?
# id 查询
GET /sms-logs-index/sms-logs-type/24
java代码查询
@Test
public void getId() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1.创建 GetRequest 对象
GetRequest request=new GetRequest(index,type,"24");
// 2. 执行查询
GetResponse response = client.get(request, RequestOptions.DEFAULT);
// 3. 输出结果
System.out.println(response.getSourceAsMap());
}
5.3.2 ids查询
根据多个id查询,类似mysql中 where id in (id1,id2,id3…)
# ids查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"ids": {
"values": ["24","25","27"]
}
}
}
java 代码实现
@Test
public void getIds() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.idsQuery().addIds("24","25","27"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.3.3 prefix 查询
前缀查询,可以通过一个 关键字 去指定一个 field 的前缀,从而查询到指定的文档
# prefix 查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"prefix": {
"corpName": {
"value": "滴滴"
}
}
}
}
java代码实现
@Test
public void getIdsByPrefix() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.prefixQuery("corpName","滴滴"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.3.4 fuzzy 查询
模糊查询,ES根据输入的内容大概去匹配一下结果
# fuzzy 模糊查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"fuzzy": {
"corpName": {
"value": "滴滴",
"prefix_length": 2
}
}
}
}
java 代码实现
@Test
public void getIdsByFuzzy() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.fuzzyQuery("corpName","滴滴").prefixLength(2));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.3.5 wildcard 查询
通配符查询,和mysql中的like差不多,可以在查询时,在字符串中指定通配符 * 和占位符 ?
# wildcard 模糊查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"wildcard": {
"corpName": {
"value": "中国??"
}
}
}
}
java代码实现
@Test
public void getIdsByWildCard() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.wildcardQuery("corpName","中国*"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.3.6 range查询
范围查询,只针对数值类型,对某一个 field 进行大于或者小于的范围指定
# range查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"range": {
"replyTotal": {
"gte": 20,
"lte": 50
}
}
}
}
java 代码实现
@Test
public void getIdsByRange() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.rangeQuery("replyTotal").gte(20).lte(50));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.3.7 regexp 正则查询
# regexp 正则查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"regexp": {
"mobile": "180[0-9]{8}"
}
}
}
java 代码实现
@Test
public void getIdsByRegexp() throws IOException {
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.query(QueryBuilders.regexpQuery("mobile","180[0-9]{8}"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.4 深分页Scroll
ES对 from+size 是有限制的,from和size二者之和不能超过1w
原理:from+size 在ES查询数据的方式:
1.先将用户指定的关键词进行分词
2.将词汇去分词库中进行检索,得到多个文档id
3.去各个分片中去拉取指定的数据(耗时较长)
4.将数据根据score进行排序(耗时较长)
5.根据from值,将查询到的数据舍弃一部分
6.返回结果。
Scroll+size 在 ES查询数据的方式:
1.先将用户指定的关键词进行分词
2.将词汇去分词库中进行检索,得到多个文档id
3.将文档的id存放在一个ES的上下文中
4.根据你指定的size的个数去ES中检索指定个数的数据,拿完数据的文档id,会从上下文中移除
5.如果需要下一页数据,直接去ES的上下文中找后续内容。
6.循环第四步和第五步
Scroll查询方式不适合实时查询
执行scroll查询,返回第一页数据,并且将文档id信息存放在ES上下文中,指定生存时间 1m
# scroll
POST /sms-logs-index/sms-logs-type/_search?scroll=1m
{
"query": {
"match_all": {}
},
"size": 5,
"sort": [
{
"fee": {
"order": "desc"
}
}
]
}
# 根据scroll查询下一页数据
POST /_search/scroll
{
"scroll":"1m",
"scroll_id":
"id"
}
# 删除scroll在ES上下文中的数据
DELETE /_search/scroll/id
java代码实现
@Test
public void scrollQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定 scroll 信息
request.scroll(TimeValue.timeValueMinutes(1L));
// 3. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.size(5);
builder.sort("fee", SortOrder.DESC);
builder.query(QueryBuilders.matchAllQuery());
request.source(builder);
// 4. 获取返回结果 scrollId,source
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
String scrollId = response.getScrollId();
System.out.println("首页");
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
while (true) {
// 5. 循环--> 创建 SearchScrollRequest
SearchScrollRequest scrollRequest=new SearchScrollRequest(scrollId);
// 6. 指定 scrollId
scrollRequest.scroll(TimeValue.timeValueMinutes(1L));
// 7. 执行查询获取返回结果
SearchResponse searchResponse=client.scroll(scrollRequest,RequestOptions.DEFAULT);
// 8. 判断是否查询到了数据,输出
SearchHit[] hits = searchResponse.getHits().getHits();
if (hits.length > 0 && hits != null) {
System.out.println("下一页");
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsMap());
}
}else {
// 9. 判断没有查询到数据-->退出循环
System.out.println("结束");
break;
}
}
// 10. 创建 ClearScrollRequest
ClearScrollRequest clearScrollRequest=new ClearScrollRequest();
// 11. 指定 scrollId
clearScrollRequest.addScrollId(scrollId);
// 12. 删除 scrollId
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
System.out.println("删除scroll:"+clearScrollResponse.isSucceeded());
}
5.5 delete-by-query
根据term,match等查询方式去删除大量文档
# delete-by-query
POST /sms-logs-index/sms-logs-type/_delete_by_query
{
"query": {
"range": {
"fee": {
"lt": 4
}
}
}
}
java 代码实现
@Test
public void deleteByQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 DeleteByQueryRequest
DeleteByQueryRequest request=new DeleteByQueryRequest(index);
request.types(type);
// 2. 指定检索的条件
request.setQuery(QueryBuilders.rangeQuery("fee").lt(4));
// 3. 执行删除
BulkByScrollResponse response = client.deleteByQuery(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
System.out.println(response.toString());
}
5.6 复合查询
5.6.1 bool查询
复合过滤器,将你的多个查询条件,以一定的逻辑组合在一起
must:所有的条件,用must组合在一起,表示and的意思
must_not:将 must_not中的条件,全部都不能匹配,表示not的意思
should:所有的条件,用should组合在一起,表示 or的意思
# 查询省份为武汉或者北京
# 运营商不是联通
# SMSContent中包含中国和平安
# bool查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"province": {
"value": "武汉"
}
}
},{
"term": {
"province": {
"value": "北京"
}
}
}
],"must_not": [
{
"term": {
"operatorId": {
"value": "2"
}
}
}
],"must": [
{
"match": {
"smsContent": {
"query": "中国 平安",
"operator": "and"
}
}
}
]
}
}
}
java 代码实现
@Test
public void boolQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
BoolQueryBuilder bool = QueryBuilders.boolQuery();
bool.should(QueryBuilders.termQuery("province","武汉"));
bool.should(QueryBuilders.termQuery("province","北京"));
bool.mustNot(QueryBuilders.termQuery("operatorId",2));
bool.must(QueryBuilders.matchQuery("smsContent","中国 平安").operator(Operator.AND));
request.source(builder.query(bool));
// 3. 执行查询
SearchResponse response=client.search(request,RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.6.2 boosting 查询
bootsting查询可以帮助我们去影响查询后的score
positive:只有匹配上positive的查询的内容,才会被放到返回的结果集中
negative:如果匹配上和positive并且也匹配上了negative,就可以降低这样的文档score
negative_boost:指定系数,必须小于1.0
关于查询时,分数是如何计算的:
搜索的关键字在文档中出现的频次越高,分数就越高
指定的文档内容越短,分数就越高
我们在搜索时,指定的关键字也会被分词,这个被分词的内容,被分词库匹配的个数越多,分数越高。
# bootsting 查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"boosting": {
"positive": {
"match": {
"smsContent": "收货安装"
}
},"negative": {
"match": {
"smsContent": "张三"
}
},"negative_boost": 0.5
}
}
}
java代码实现
@Test
public void bootStingQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
BoostingQueryBuilder queryBuilder = QueryBuilders.boostingQuery(
QueryBuilders.matchQuery("smsContent", "收货安装"),
QueryBuilders.matchQuery("smsContent", "张三")
);
queryBuilder.negativeBoost(0.5F);
request.source(builder.query(queryBuilder));
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.7 filter查询
query:根据你的查询条件,去计算文档的匹配度得到一个分数,并且根据分数进行排序,不会做缓存
filter:根据你的查询条件查询文档,不去计算分数,而且filter会对经常被过滤的数据进行缓存
# filter查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"bool": {
"filter": [
{
"term":{
"corpName":"招商银行"
}
},
{
"range":{
"fee":{
"lte":9
}
}
}
]
}
}
}
java 代码实现
@Test
public void FilterQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
BoolQueryBuilder bool = QueryBuilders.boolQuery();
bool.filter(QueryBuilders.termQuery("corpName","招商银行"));
bool.filter(QueryBuilders.rangeQuery("fee").lte(8));
request.source(builder.query(bool));
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
System.out.println(response.getHits().getHits().length);
}
5.8 高亮查询
高亮查询就是将用户输入的关键字,以一定的特殊样式展示给用户,让用户知道为什么这个结果被检索出来。
高亮展示的数据,本身就是文档中的一个field,单独将当前field以highlight的形式返回给你。
ES提供了highlight属性,和query同级别的
fragment_size:指定高亮数据展示多少个字符
pre_tags:指定前缀标签,<font color=”red”>
post_tags:指定后缀标签,</font>
fields:指定哪几个field以高亮形式返回
# highlight 查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match": {
"smsContent": "盒马"
}
},
"highlight": {
"fields": {
"smsContent": {}
},
"pre_tags": "<font color='red'>",
"post_tags": "</font>",
"fragment_size": 10
}
}
java代码实现
@Test
public void highlightQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定查询条件 (高亮)
SearchSourceBuilder builder=new SearchSourceBuilder();
// 2.1 指定查询条件
builder.query(QueryBuilders.matchQuery("smsContent", "盒马"));
// 2.2 指定高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("smsContent",10)
.preTags("<font color='red'>")
.postTags("</font>");
request.source(builder.highlighter(highlightBuilder));
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取高亮数据,输出
SearchHit[] hits = response.getHits().getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getHighlightFields().get("smsContent"));
}
System.out.println(hits.length);
}
5.9 聚合查询
ES的聚合查询和mysql的聚合查询类似,ES的聚合查询相比mysql要强大的多,ES提供的统计数据的方式多种多样。
# ES聚合查询的RESTful语法
POST /index/type_search
{
"aggs":{
"名字(agg)":{
"agg_type":{
"属性":"值"
}
}
}
}
5.9.1 去重计数查询
去重计数,即Cardinality ,第一步先将返回的文档中的一个指定的field进行去重,统计一共有多少条。
# 去重计数 北京 武汉 上海 山西
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"agg": {
"cardinality": {
"field": "province"
}
}
}
}
java 代码实现
@Test
public void aggQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定使用的聚合查询方式
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.aggregation(AggregationBuilders.cardinality("agg").field("province"));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取返回结果
Cardinality agg = response.getAggregations().get("agg");
System.out.println(agg.getValue());
}
5.9.2 范围统计
统计一定范围内出现的文档个数,比如针对某一个field的值在0~100,100~200,200~300之间文档出现的个数分别是多少
范围统计可以针对普通的数值,针对时间类型,针对ip类型都可以做出相应的统计。
range,date_range,ip_range
# 数值方式范围统计
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"agg": {
"range": {
"field": "fee",
"ranges": [
{
"to": 5
},
{
"from": 5,
"to": 10
},
{
"from": 10
}
]
}
}
}
}
# 时间方式范围统计
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"agg": {
"date_range": {
"field": "createDate",
"format": "yyyy",
"ranges": [
{
"to": 2000
},
{
"from": 2000
}
]
}
}
}
}
# ip方式范围统计
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"agg": {
"ip_range": {
"field": "ipAddr",
"ranges": [
{
"to": "10.126.2.9"
},
{
"from": "10.126.2.9"
}
]
}
}
}
}
java代码实现
@Test
public void RangeQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定使用的聚合查询方式
SearchSourceBuilder builder=new SearchSourceBuilder();
// 日期
/*request.source(builder.aggregation(
AggregationBuilders.dateRange("agg").format("yyyy").field("createDate")
.addUnboundedTo(2000).addUnboundedFrom(2000)
));*/
// 数值
request.source(builder.aggregation(AggregationBuilders.range("agg").field("fee")
.addUnboundedTo(5).addRange(5,10).addUnboundedFrom(10)));
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取返回结果
Range agg = response.getAggregations().get("agg");
for (Range.Bucket bucket : agg.getBuckets()) {
String key = bucket.getKeyAsString();
long docCount = bucket.getDocCount();
System.out.println(key+"/"+docCount);
}
}
5.9.3 统计聚合查询
它可以帮你查询指定 field 的最大值,最小值,平均值,平方和
使用 extented_stats
# 统计聚合查询
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"agg": {
"extended_stats": {
"field": "fee"
}
}
}
}
java代码实现
@Test
public void extendedRangeQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
// 1. 创建 SearchRequest 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定使用的聚合查询方式
SearchSourceBuilder builder=new SearchSourceBuilder();
builder.aggregation(AggregationBuilders.extendedStats("agg").field("fee"));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取返回结果
ExtendedStats agg = response.getAggregations().get("agg");
System.out.println(agg.getMax()+"\t"+agg.getMin()+"\t"+agg.getSum());
}
5.10 地图经纬度搜索
ES中提供了一个数据类型 geo_point,这个类型就是用来存储经纬度的
# 创建个索引,指定一个 name,location
PUT /map-maps
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 5
},
"mappings": {
"map":{
"properties":{
"name":{
"type":"text"
},
"location":{
"type":"geo_point"
}
}
}
}
}
# 添加测试数据
PUT /map-maps/map/1
{
"name":"春熙路步行街",
"location":{
"lon":104.084227,
"lat":30.661516
}
}
PUT /map-maps/map/2
{
"name":"成都图书馆",
"location":{
"lon":104.067196,
"lat":30.658928
}
}
PUT /map-maps/map/3
{
"name":"宽窄巷子",
"location":{
"lon":104.059763,
"lat":30.669938
}
}
5.10.1 ES的地图检索方式
geo_distance :直线距离检索方式
geo_bounding_box :以两个点确定一个矩形,获取在矩形内的全部数据
geo_polygon :以多个点,确定一个多边形,获取在多边形内的全部数据
5.10.2 基于RESTful实现地图检索
# geo_distance
POST /map-maps/map/_search
{
"query": {
"geo_distance":{
"location":{ # 确定一个点
"lon":104.073598,
"lat":30.669567
},
"distance": 3000, # 确定半径
"distance_type":"arc" #指定形状为圆形
}
}
}
# geo_bounding_box
POST /map-maps/map/_search
{
"query": {
"geo_bounding_box":{
"location":{
"top_left":{
"lon":104.058147,
"lat":30.670825
},
"bottom_right":{
"lon":104.061075,
"lat":30.667377
}
}
}
}
}

# geo_polygon
POST /map-maps/map/_search
{
"query": {
"geo_polygon":{
"location":{
"points":[
{
"lon":104.058129,
"lat":30.670825
},{
"lon":104.056458,
"lat":30.668247
},{
"lon":104.060994,
"lat":30.669443
}
]
}
}
}
}

5.10.3 java实现 geo_polygon
@Test
public void geoPolygonQuery() throws Exception{
RestHighLevelClient client = ESClient.getClient();
String index = "map-maps";
String type = "map";
// 1. 创建 SearchRequest 对象
SearchRequest request=new SearchRequest(index);
request.types(type);
// 2. 指定使用的聚合查询方式
SearchSourceBuilder builder=new SearchSourceBuilder();
List<GeoPoint> list=new ArrayList<>();
list.add(new GeoPoint(30.670825,104.058129));
list.add(new GeoPoint(30.668247,104.056458));
list.add(new GeoPoint(30.669443,104.060994));
builder.query(QueryBuilders.geoPolygonQuery("location",list));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取返回结果
SearchHit[] hits = response.getHits().getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsMap());
}
}



发表回复