使用JGroups TCP实现EHCache的集群

ygp8 10年前

最近一个项目采用ehcache作为缓存技术，因为负载需要，使用两台服务器做负载均衡，所以需要做缓存的集群处理，综合各方面因素，决定使用JGroups的方式，接下来是连续3天的折磨，今天终于搞定，把这个过程总结出来分享，希望类似需要的朋友别再重蹈我的曲折。

1、首先不要一上来就搜索如何配置，要把基础环境搭好，这也是网上90%的类似文章中不涉及的。像 http://blog.csdn.net/kindy1022/article/details/6681299 这样的文章才真正有用，但仍然不够详细。接下来是详细内容：

（1）我使用nginx + tomcat7 + jdk7；

（2）ehcache版本为2.10，建议大家直接使用ehcache-2.10.jar而不要用ehcache-core-xxx.jar+ehcache-terracotta-xxx.jar；

（3）jgroups使用最新的jgroups-3.6.4FINAL.jar，这个容易被忽略，网上很少有人提到，因为有ehcahce-jgroupsreplication-xxx.jar，所以会以为这就够了，关键启动还不报错。另外不必降版本；

（4）ehcache-jgroupsreplication-1.7.jar（就是查看这里面的源码时，发现JGroupsCacheReceiver需要jgroups jar包的支持）

2、再说配置文件，这是网上传讹最多的，一是不讲版本，直接贴配置，二是配置本身也有错误。建议大家去jgroups和ehcache的官网上看相关的配置，注意是相关的配置，ehcache官网上给出的jgroups也不完整，也没标版本。这里一定要注意。建议大家将jgroups的配置使用单独的配置文件，这样更合理一些。

（1）echache配置文件ehcache.xml，首先增加peerprovider

<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"          properties="jgroups_tcp.xml" />

（2）为每一个需要同步的cache配置listener，当然asynchronousReplicationIntervalMillis不是必须的，默认是1000，bootstrapCacheLoaderFactory也可以不要

<cache name="mybatis_common" overflowToDisk="true" eternal="true"            timeToIdleSeconds="300" timeToLiveSeconds="600" maxElementsInMemory="10000"            maxElementsOnDisk="100" diskPersistent="true" diskExpiryThreadIntervalSeconds="300"            diskSpoolBufferSizeMB="50" memoryStoreEvictionPolicy="LRU">            <cacheEventListenerFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"                properties="replicateAsynchronously=true, replicatePuts=true,                replicateUpdates=true, replicateUpdatesViaCopy=false, replicateRemovals=true",                asynchronousReplicationIntervalMillis=500/>            <bootstrapCacheLoaderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsBootstrapCacheLoaderFactory"                 properties="bootstrapAsynchronously=false"/>  </cache>

（3）jgroups_tcp.xml如下：参考 http://www.jgroups.org/manual/index.html#_tcp

<TCP bind_port="7800" />  <TCPPING timeout="3000"           initial_hosts="app1_IP[7800],app2_IP[7800]"           port_range="10"           num_initial_members="3"/>  <VERIFY_SUSPECT timeout="1500"  />  <pbcast.NAKACK2 use_mcast_xmit="false" gc_lag="100"                 retransmit_timeout="300,600,1200,2400,4800"                 discard_delivered_msgs="true"/>  <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"                 max_bytes="400000"/>  <pbcast.GMS print_local_addr="true" join_timeout="3000" shun="false"                 view_bundling="true"/>

到底是bind_port还是start_port，官网给出的是bind_port。

3、一般情况下，这样就足够了，但是事有例外，如果仍然不行，看看下面的可能性：

（1）集群的服务器能不能连通，有没有防火墙之类

（2）每台服务器是不是有完整的、唯一的hostname，如果你的hostname有中文，建议改成英文，如果你恰好使用mac电脑开发测试，那它的电脑名和hostname是两回事，默认的hostname是localhost，这个不行，要改成正经的。

（3）现在的eclipse可以反编译class文件，并且可以在class上打断点debug，在ehcahce-jgroupsreplication-xxx.jar里找到listener类和JGroupsCacheReceiver，加上断点，看发送和接收消息是否都被触发。

Good Luck。

来自：http://my.oschina.net/u/866380/blog/501082

使用JGroups TCP实现EHCache的集群

相关经验

目录