欢迎投稿

今日深度:

Solr增量索引,

Solr增量索引,


注:全量索引和增量索引data-config.xml和delta-data-config.xml配置文件默认放在和solrconfig.xml同级目录

solrconfig.xml配置如下:

<requestHandler name="/dataimport"
		 class="org.apache.solr.handler.dataimport.DataImportHandler">
		 <lst name="defaults">
			  <str name="config">delta-data-config.xml</str>
		 </lst>
	</requestHandler>


delta-data-config.xml


 <dataConfig>
    <dataSource name="jdbc" driver="com.mysql.jdbc.Driver"
        url="jdbc:mysql://localhost:3306/test?zeroDateTimeBehavior=convertToNull"
        user="root" password="shyh"/>
    <document name="st_data">
        <entity name="solrtext" pk="id"
               query="select * from solrtext"
                deltaImportQuery="select * from solrtext where id='${dih.delta.id}'"
                deltaQuery="select id from solrtext where addon > '${dih.last_index_time}'"
                transformer="RegexTransformer">
            <field column="id" name="id" />
            <field column="url" name="url" />
            <field column="title" name="title" />
            <field column="author" name="author" />
	    <field column="addon" name="addon"/>
	    <field column="path" name="path"/>
        </entity>
    </document>
</dataConfig>


schemal.xml

   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="url" type="text_general" indexed="true" stored="true" />
   <field name="title" type="text_general" indexed="true" stored="true"/>
   <field name="author" type="text_general" indexed="true" stored="true"/>
   <field name="addon" type="string" indexed="true" stored="true"/>
    <field name="path" type="string" indexed="false" stored="true"/>

上面主要是通过内置变量“${dih.delta.id}”和 “${dih.last_index_time}”来记录本次索引的id和最后索引时间。这里,会保存在deltaimport.properties文件中,示例如下:

#Mon Jan 26 11:13:07 CST 2015
solrtext.last_index_time=2015-01-26 11\:12\:35
last_index_time=2015-01-26 11\:12\:35


配置定时任务:

  • 将 apache-solr-dataimportscheduler-1.0.jar 和solr自带的 apache-solr-dataimporthandler-.jar, apache-solr-dataimporthandler-extras-.jar 放到tomcat/webapps/solr/WEB-INF的lib目录下面 
  • 修改solr中WEB-INF/web.xml
  • <listener>
              <listener-class>
                    org.apache.solr.handler.dataimport.scheduler.ApplicationListener
              </listener-class>
    </listener>

    将apache-solr-dataimportscheduler-.jar 中 dataimport.properties 取出并根据实际情况修改,然后放到 solr.home/conf (不是solr.home/core/conf) 目录下面 ,我的位置为:F:\solr\solrhome\conf(如不存在conf可手动新建,dataimport.properties存放在tomcat的solr.xml里配置的solr/home路径的conf文件夹下
  • solr.xml配置如下
  • <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
       <Context docBase="F:\solr\apache-tomcat-7.0.53\webapps\solr.war" debug="0" crossContext="true" >
        <Environment name="solr/home" type="java.lang.String" value="F:\solr\solrhome" override="true" />
       </Context>


  • dataimport.properties配置说明
  • #################################################
    #                                               #
    #       dataimport scheduler properties         #
    #                                               #
    #################################################
     
    #  to sync or not to sync
    #  1 - active; anything else - inactive
    syncEnabled=1
     
    #  which cores to schedule
    #  in a multi-core environment you can decide which cores you want syncronized
    #  leave empty or comment it out if using single-core deployment
    syncCores=collection1
     
    #  solr server name or IP address
    #  [defaults to localhost if empty]
    server=localhost
     
    #  solr server port
    #  [defaults to 80 if empty]
    port=8080
     
    #  application name/context
    #  [defaults to current ServletContextListener's context (app) name]
    webapp=solr
     
    #  URL params [mandatory]
    #  remainder of URL
    params=/dataimport?command=delta-import&clean=false&commit=true
     
    #  schedule interval
    #  number of minutes between two runs
    #  [defaults to 30 if empty]
    interval=1
     
    #  重做索引的时间间隔,单位分钟,默认7200,即5天; 
    #  为空,为0,或者注释掉:表示永不重做索引
    reBuildIndexInterval=7200
     
    #  重做索引的参数
    reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true
     
    #  重做索引时间间隔的计时开始时间,第一次真正执行的时间=reBuildIndexBeginTime+reBuildIndexInterval*60*1000;
    #  两种格式:2012-04-11 03:10:00 或者  03:10:00,后一种会自动补全日期部分为服务启动时的日期
    reBuildIndexBeginTime=03:10:00





www.htsjk.Com true http://www.htsjk.com/solr/38979.html NewsArticle Solr增量索引, 注:全量索引和增量索引data-config.xml和delta-data-config.xml配置文件 默认放在和solrconfig.xml同级目录 solrconfig.xml配置如下: requestHandler name="/dataimport" class="org.apache.solr.handle...
相关文章
    暂无相关文章
评论暂时关闭