solr中文分词，

和通数据库htsjk.Com2019-11-13 23:17 来源:未知阅读:11627 评论 435 热度4

标签：

solr中文分词，

1、solr默认的中文分词对中文支持的不太好

2、添加比较好用的中文分词的jar，一个是mmseg4j，另一个是IKAnalyzer

mmseg4j-solr-2.3.0支持solr5.3，将两个jar包考入I:\SolrServer\solr5.3.1\webapps\solr\WEB-INF\lib文件夹内

3、配置I:\SolrServer\solr5.3.1\solr\mysolr\conf文件下的schema.xml文件，新增fieldType

<fieldtype name="textComplex" class="solr.TextField"
	positionIncrementGap="100">
		<analyzer>
			<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"
				mode="complex" dicPath="dic" />
		</analyzer>
	</fieldtype>
	<fieldtype name="textMaxWord" class="solr.TextField"
		positionIncrementGap="100">
		<analyzer>
			<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"
				mode="max-word" />
		</analyzer>
	</fieldtype>
	<fieldtype name="textSimple" class="solr.TextField"
		positionIncrementGap="100">
		<analyzer>
			<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"
				mode="simple" dicPath="n:/custom/path/to/my_dic" />
		</analyzer>
</fieldtype>

4、重启tomcat测试分词：（选择刚刚定义的textMaxWord）

5、新增要用到mmseg4j分词索引的字段 content_test 分词器选择定义好的textMaxWord