首页  

中文分词返回结果为空问题记录     所属分类 elasticsearch 浏览量 1002
String url = "http://127.0.0.1:9200/_analyze";
Map map = new HashMap<>();
map.put("text", "中国人民站起来了");
map.put("analyzer", "ik_smart");
String method = "POST";
String json = JSON.toJSONString(map);
String result = EsClientUtils.doExecute(url, method, json, null);
System.out.println(result);

中文分词结果为空, 英文分词结果 ok
怀疑中文乱码了 

使用 tcpdump 监控 elasticsearch 9200 端口 

sudo tcpdump  -i lo0   -vvlX   port 9200 
一定要加 -i lo0  , 否则无法抓取 本地请求

12:50:22.090957 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 322, bad cksum 0 (->3bb4)!)
    localhost.58609 > localhost.wap-wsp: Flags [P.], cksum 0xff36 (incorrect -> 0x3806), seq 1:271, ack 1, win 6379, options [nop,nop,TS val 893349230 ecr 893349225], length 270
	0x0000:  4500 0142 0000 4000 4006 0000 7f00 0001  E..B..@.@.......
	0x0010:  7f00 0001 e4f1 23f0 db3a 483f 05b5 b887  ......#..:H?....
	0x0020:  8018 18eb ff36 0000 0101 080a 353f 6d6e  .....6......5?mn
	0x0030:  353f 6d69 504f 5354 202f 5f61 6e61 6c79  5?miPOST./_analy
	0x0040:  7a65 2048 5454 502f 312e 310d 0a43 6f6e  ze.HTTP/1.1..Con
	0x0050:  7465 6e74 2d4c 656e 6774 683a 2034 310d  tent-Length:.41.
	0x0060:  0a43 6f6e 7465 6e74 2d54 7970 653a 2061  .Content-Type:.a
	0x0070:  7070 6c69 6361 7469 6f6e 2f6a 736f 6e0d  pplication/json.
	0x0080:  0a43 6f6e 7465 6e74 2d45 6e63 6f64 696e  .Content-Encodin
	0x0090:  673a 2055 5446 2d38 0d0a 486f 7374 3a20  g:.UTF-8..Host:.
	0x00a0:  3132 372e 302e 302e 313a 3932 3030 0d0a  127.0.0.1:9200..
	0x00b0:  436f 6e6e 6563 7469 6f6e 3a20 4b65 6570  Connection:.Keep
	0x00c0:  2d41 6c69 7665 0d0a 5573 6572 2d41 6765  -Alive..User-Age
	0x00d0:  6e74 3a20 4170 6163 6865 2d48 7474 7043  nt:.Apache-HttpC
	0x00e0:  6c69 656e 742f 342e 332e 3120 286a 6176  lient/4.3.1.(jav
	0x00f0:  6120 312e 3529 0d0a 4163 6365 7074 2d45  a.1.5)..Accept-E
	0x0100:  6e63 6f64 696e 673a 2067 7a69 702c 6465  ncoding:.gzip,de
	0x0110:  666c 6174 650d 0a0d 0a7b 2261 6e61 6c79  flate....{"analy
	0x0120:  7a65 7222 3a22 696b 5f73 6d61 7274 222c  zer":"ik_smart",
	0x0130:  2274 6578 7422 3a22 3f3f 3f3f 3f3f 3f3f  "text":"????????
	0x0140:  227d                                     "}
	
	
text部分果然乱码了

EsClientUtils 处理有问题


	private static HttpUriRequest buildRequestInfo(String url, String method, String requestBody)throws Exception{
		if(METHOD_POST.equals(method)){
			HttpPost request = new HttpPost(url);		
			if(!StringUtils.isBlank(requestBody)){
				// new StringEntity(requestBody.toString())
				// 中文分词 bugfix
				StringEntity s = new StringEntity(requestBody.toString(),ContentType.APPLICATION_JSON);
				// s.setContentEncoding("UTF-8");
				// s.setContentType("application/json");		
				request.setEntity(s);
			}	
			return request;
		}
		if(METHOD_PUT.equals(method)){
			HttpPut request = new HttpPut(url);		
			if(!StringUtils.isBlank(requestBody)){
				StringEntity s = new StringEntity(requestBody.toString(),ContentType.APPLICATION_JSON);
				// s.setContentEncoding("UTF-8");
				// s.setContentType("application/json");		
				request.setEntity(s);
			}	
			return request;
		}
		
		if(METHOD_GET.equals(method)){
			HttpGet request = new HttpGet(url);		
			
			return request;
		}
		
		
		if(METHOD_DELETE.equals(method)){
			HttpDelete request = new HttpDelete(url);		
			
			return request;
		}
		
		return null;
	}



另外http请求结果转成string  ,也指定下字符集
HttpEntity entity = res.getEntity();
// EntityUtils.toString(entity)
String result = EntityUtils.toString(entity,CommUtil.UTF8);
		

完整代码
EsClientUtils.java
https://gitee.com/dyyx/iview/blob/master/src/main/java/dyyx/util/EsClientUtils.java
EsAnalyzeTest.java
https://gitee.com/dyyx/iview/blob/master/src/test/java/dyyx/EsAnalyzeTest.java


tcpdump使用简介

上一篇     下一篇
哈佛训言

费曼学习法:输出是最好的学习

jvm可视化分析工具

常见的安全漏洞

分布式事务六种解决方案

elasticsearch禁用自动创建索引和类型