中文分词返回结果为空问题记录
所属分类 elasticsearch
浏览量 1010
String url = "http://127.0.0.1:9200/_analyze";
Map map = new HashMap<>();
map.put("text", "中国人民站起来了");
map.put("analyzer", "ik_smart");
String method = "POST";
String json = JSON.toJSONString(map);
String result = EsClientUtils.doExecute(url, method, json, null);
System.out.println(result);
中文分词结果为空, 英文分词结果 ok
怀疑中文乱码了
使用 tcpdump 监控 elasticsearch 9200 端口
sudo tcpdump -i lo0 -vvlX port 9200
一定要加 -i lo0 , 否则无法抓取 本地请求
12:50:22.090957 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 322, bad cksum 0 (->3bb4)!)
localhost.58609 > localhost.wap-wsp: Flags [P.], cksum 0xff36 (incorrect -> 0x3806), seq 1:271, ack 1, win 6379, options [nop,nop,TS val 893349230 ecr 893349225], length 270
0x0000: 4500 0142 0000 4000 4006 0000 7f00 0001 E..B..@.@.......
0x0010: 7f00 0001 e4f1 23f0 db3a 483f 05b5 b887 ......#..:H?....
0x0020: 8018 18eb ff36 0000 0101 080a 353f 6d6e .....6......5?mn
0x0030: 353f 6d69 504f 5354 202f 5f61 6e61 6c79 5?miPOST./_analy
0x0040: 7a65 2048 5454 502f 312e 310d 0a43 6f6e ze.HTTP/1.1..Con
0x0050: 7465 6e74 2d4c 656e 6774 683a 2034 310d tent-Length:.41.
0x0060: 0a43 6f6e 7465 6e74 2d54 7970 653a 2061 .Content-Type:.a
0x0070: 7070 6c69 6361 7469 6f6e 2f6a 736f 6e0d pplication/json.
0x0080: 0a43 6f6e 7465 6e74 2d45 6e63 6f64 696e .Content-Encodin
0x0090: 673a 2055 5446 2d38 0d0a 486f 7374 3a20 g:.UTF-8..Host:.
0x00a0: 3132 372e 302e 302e 313a 3932 3030 0d0a 127.0.0.1:9200..
0x00b0: 436f 6e6e 6563 7469 6f6e 3a20 4b65 6570 Connection:.Keep
0x00c0: 2d41 6c69 7665 0d0a 5573 6572 2d41 6765 -Alive..User-Age
0x00d0: 6e74 3a20 4170 6163 6865 2d48 7474 7043 nt:.Apache-HttpC
0x00e0: 6c69 656e 742f 342e 332e 3120 286a 6176 lient/4.3.1.(jav
0x00f0: 6120 312e 3529 0d0a 4163 6365 7074 2d45 a.1.5)..Accept-E
0x0100: 6e63 6f64 696e 673a 2067 7a69 702c 6465 ncoding:.gzip,de
0x0110: 666c 6174 650d 0a0d 0a7b 2261 6e61 6c79 flate....{"analy
0x0120: 7a65 7222 3a22 696b 5f73 6d61 7274 222c zer":"ik_smart",
0x0130: 2274 6578 7422 3a22 3f3f 3f3f 3f3f 3f3f "text":"????????
0x0140: 227d "}
text部分果然乱码了
EsClientUtils 处理有问题
private static HttpUriRequest buildRequestInfo(String url, String method, String requestBody)throws Exception{
if(METHOD_POST.equals(method)){
HttpPost request = new HttpPost(url);
if(!StringUtils.isBlank(requestBody)){
// new StringEntity(requestBody.toString())
// 中文分词 bugfix
StringEntity s = new StringEntity(requestBody.toString(),ContentType.APPLICATION_JSON);
// s.setContentEncoding("UTF-8");
// s.setContentType("application/json");
request.setEntity(s);
}
return request;
}
if(METHOD_PUT.equals(method)){
HttpPut request = new HttpPut(url);
if(!StringUtils.isBlank(requestBody)){
StringEntity s = new StringEntity(requestBody.toString(),ContentType.APPLICATION_JSON);
// s.setContentEncoding("UTF-8");
// s.setContentType("application/json");
request.setEntity(s);
}
return request;
}
if(METHOD_GET.equals(method)){
HttpGet request = new HttpGet(url);
return request;
}
if(METHOD_DELETE.equals(method)){
HttpDelete request = new HttpDelete(url);
return request;
}
return null;
}
另外http请求结果转成string ,也指定下字符集
HttpEntity entity = res.getEntity();
// EntityUtils.toString(entity)
String result = EntityUtils.toString(entity,CommUtil.UTF8);
完整代码
EsClientUtils.java
https://gitee.com/dyyx/iview/blob/master/src/main/java/dyyx/util/EsClientUtils.java
EsAnalyzeTest.java
https://gitee.com/dyyx/iview/blob/master/src/test/java/dyyx/EsAnalyzeTest.java
tcpdump使用简介
上一篇
下一篇
哈佛训言
费曼学习法:输出是最好的学习
jvm可视化分析工具
常见的安全漏洞
分布式事务六种解决方案
elasticsearch禁用自动创建索引和类型