Elasticsearch 1.X
这个文档大约2014年在写的,只是修改了下版本号,这个文档写的非常简单,后面有空再发。 elasticsearch和solr都非常适合拿来做分布式全文索引,可以轻松的处理海量数据。 最近一年也没咋发文章,攒了很多都被弄丢掉了...
elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。使用elasticsearch可以快速的构建一个全文检索集群帮助你实时搜索。
一、下载安装
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.7.3.zip unzip elasticsearch-1.7.3.zip cp elasticsearch-1.7.3 elasticsearch-1.7.3-2
单机配置:
启动:
./bin/elasticsearch(windows下双击elasticsearch.bat)后台方式启动:
./bin/elasticsearch -d启动成功后会监听:9200(web api端)、9300(socket api)、54328(zen discovery udp广播)
二:中文分词
在默认情况下elasticsearch是不支持中文分词的,所以需要自行安装分词器以便于检索中文字符。这里采用elasticsearch-analysis-ik+mmseg用于中文分词索引。
1、下载并安装maven
wget http://apache.communilink.net/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.zip配置环境变量:
vim ~/.bash_profile末尾添加:
export M2_HOME=/data/apache-maven-3.2.5 PATH=$PATH:$JAVA_HOME/bin:$M2_HOME/bin
2、下载ik和mmseg插件
解压elasticsearch-analysis-ik-master.zip
编辑elasticsearch-analysis-ik-master/pom.xml中的elasticsearch版本号为1.42(安装的es版本)
3、安装elasticsearch-analysis-ik
构建elasticsearch-analysis-ik jar
cd elasticsearch-analysis-ik-master mvn clean package
构建完成后复制target目录下生成的elasticsearch-analysis-ik-1.2.9.jar到elasticsearch安装目录的lib文件夹。复制elasticsearch-analysis-ik-master/config/ik文件夹到elasticsearch安装目录的config文件夹。
配置elasticsearch-1.4.2/config/elasticsearch.yml
添加以下配置:
index: analysis: analyzer: ik: alias: [ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_max_word: type: ik use_smart: false ik_smart: type: ik use_smart: true或者:
index.analysis.analyzer.ik.type : "ik"4、安装HttpClient
wget http://apache.01link.hk//httpcomponents/httpclient/binary/httpcomponents-client-4.3.6-bin.zip
解压后复制httpcomponents-client-4.3.6 /lib下的fluent-hc-4.3.6.jar、httpclient-4.3.6.jar、httpclient-cache-4.3.6.jar、httpcore-4.3.3.jar、httpmime-4.3.6.jar到elasticsearch安装目录的lib文件夹。
5、安装elasticsearch-analysis-mmseg
- 解压elasticsearch-analysis-mmseg-master
- mvn构建elasticsearch-analysis-mmseg
- 在elasticsearch安装目录创建plugins目录,然后在plugins下创建analysis-mmseg。
- 复制构建后的elasticsearch-analysis-mmseg-1.2.2.jar文件到analysis-mmseg目录。
- 复制elasticsearch-analysis-mmseg-master/config/下的mmseg文件夹到elasticsearch安装目录的config文件夹。
三:Nginx HttpBasic认证
Elasticsearch启动后默认监听9200(netty-web端口)、9300(socket transport)。9200端口提供了RESTFUL查询支持比较方便。这里配置nginx代理为es的web接口添加负载和基础认证。
1、安装nginx
yum -y install nginx service nginx start chkconfig nginx on2、访问配置
vim /etc/nginx/conf.d/es.conf添加:
server { server_name es.xxx.com; access_log logs/es.access.log main; listen 80; location / { proxy_pass http://localhost:9200; auth_basic "secret"; auth_basic_user_file /etc/nginx/conf.d/es.db; } location /status { stub_status on; auth_basic "NginxStatus"; } }下载htpasswd脚本,根据提示生成db文件:
wget http://p2j.cn/tools/htpasswd.sh
如此配置只能保证9200端口安全,但是9300依旧可能存在问题。加上iptables或者安装es插件。如果确认只对内网开放可以配置(network.bind_host为内网IP)。
四:集群
Elasticsearch会根据集群名称自动加入新的集群,所以只要保证集群名一样就行了。
1、集群配置
核心配置文件: config/elasticsearch.yml
node.name: "es-01" cluster.name: "es-doc" node.master: true 是否设置为主节点,设置es-01为主节点。不设置也会自动选举。

复制已配置好的elasticsearch-1.4.2目录为elasticsearch-1.4.2-2(或者复制已配置好的elasticsearch-1.4.2目录到内网其他服务器)
修改: elasticsearch-1.4.2-2/config/elasticsearch.yml
修改节点配置:
node.name: "es-02" cluster.name: "es-doc" 注释掉:#node.master: true
五:测试
启动两个elasticsearch,因为配置了基础认证所以访问的时候带上密码。
1、查看集群状态
http://账号:密码@localhost:6200/_cluster/health?pretty
可以看到:
"number_of_nodes" : 2, "number_of_data_nodes" : 2,2、创建索引
curl -XPUT http://localhost:9200/index3、创建映射
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d' { "fulltext": { "_all": { "indexAnalyzer": "ik", "searchAnalyzer": "ik", "term_vector": "no", "store": "false" }, "properties": { "content": { "type": "string", "store": "no", "term_vector": "with_positions_offsets", "indexAnalyzer": "ik", "searchAnalyzer": "ik", "include_in_all": "true", "boost": 8 } } } }'4、添加索引
curl -XPOST http://localhost:9200/index/fulltext/1 -d' {"content":"美国留给伊拉克的是个烂摊子吗"}'
curl -XPOST http://localhost:9200/index/fulltext/2 -d' {"content":"公安部:各地校车将享最高路权"}'
curl -XPOST http://localhost:9200/index/fulltext/3 -d' {"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}'
curl -XPOST http:// localhost:9200/index/fulltext/4 -d' {"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}5、搜索和搜索结果高亮
curl -XPOST http://localhost:9200/index/fulltext/_search -d' { "query" : { "term" : { "content" : "中国" }}, "highlight" : { "pre_tags" : ["", ""], "post_tags" : ["", ""], "fields" : { "content" : {} } } } '
六:Java客户端
1、批量导入 添加elasticsearch-1.4.2.jar和lucene-core-4.10.2.jar
测试HtmlDoc.java:
import java.io.File; import java.io.FileInputStream; import java.io.UnsupportedEncodingException; import java.util.HashSet; import java.util.LinkedHashMap; import java.util.Map; import java.util.Set; import net.sf.json.JSONObject; import org.apache.commons.io.IOUtils; import org.elasticsearch.action.bulk.BulkRequestBuilder; import org.elasticsearch.action.bulk.BulkResponse; import org.elasticsearch.client.Client; import org.elasticsearch.client.transport.TransportClient; import org.elasticsearch.common.settings.ImmutableSettings; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.common.transport.InetSocketTransportAddress; import org.javaweb.utils.HttpRequestUtils; public class HtmlDoc { public static void addIndex(Set<map<string,string>> ls){ try { Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "es-doc").put("client.transport.sniff", true).build(); Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress("localhost",9300)); BulkRequestBuilder bulkRequest = client.prepareBulk(); for(Map<string,string> doc:ls){ bulkRequest.add(client.prepareIndex("domain", "documents").setSource(JSONObject.fromObject(doc).toString())); } BulkResponse bulkResponse = bulkRequest.execute().actionGet(); if (bulkResponse.hasFailures()) { System.out.println("导入失败..."); }else{ System.out.println("导入成功..."); } client.close(); } catch (Exception e) { System.out.println(e.toString()+",导入异常."); } } }2、映射
curl -XPUT 'http://localhost:9200/web'
curl -XPOST 'http://localhost:9200/web/_close'
curl -XPUT http://localhost:9200/web/_settings -d' { "analysis": { "analyzer": { "uniqueTokenfilter": { "type": "custom", "tokenizer": "keyword", "filter": "unique" } } } } '
curl -XPOST 'http://localhost:9200/web/_open'
curl -XPOST http://localhost:6200/web/documents/_mapping -d' { "documents": { "dynamic": true, "_all": { "enabled": false }, "_source": { "enabled": true }, "properties": { "domain": { "type": "string", "index": "not_analyzed" }, "location": { "type": "nested", "include_in_parent": true, "properties": { "ip": { "type": "string", "index": "not_analyzed" }, "country_code": { "type": "string", "index": "not_analyzed" }, "country_name": { "type": "string", "index": "not_analyzed" }, "region_name": { "type": "string", "index": "not_analyzed" }, "city": { "type": "string", "index": "not_analyzed" }, "latitude": { "type": "double" }, "longitude": { "type": "double" } } }, "port": { "type": "integer" }, "header": { "type": "string", "index": "analyzed" }, "header_info": { "type": "nested", "include_in_parent": true, "properties": { "response_code": { "type": "integer" }, "response_content_type": { "type": "string", "index": "analyzed" }, "response_message": { "type": "string", "index": "analyzed" }, "server": { "type": "string", "store": "yes", "index": "analyzed" }, "x_powered_by": { "type": "string", "store": "yes", "index": "analyzed" } } }, "title": { "type": "string", "index": "analyzed", "boost": 8 }, "body": { "type": "string", "index": "analyzed" }, "url": { "type": "string", "index": "not_analyzed" }, "encoding": { "type": "string", "index": "not_analyzed" }, "file_type": { "type": "string", "index": "not_analyzed" }, "ctime": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss", "index": "not_analyzed" }, "mtime": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss", "index": "not_analyzed" }, "md5": { "type": "string", "index": "not_analyzed" } } } }'