mysql ft指的是什么

本文小编为大家详细介绍“mysql ft指的是什么”，内容详细，步骤清晰，细节处理妥当，希望这篇“mysql ft指的是什么”文章能帮助大家解决疑惑，下面跟着小编的思路慢慢深入，一起来学习新知识吧。

mysql ft指的是FullText，即全文索引；全文索引是为了解决需要基于相似度的查询，而不是精确数值比较；全文索引在大量的数据面前，能比like快N倍，速度不是一个数量级。

MySQL 全文索引 (FullText)

一、简介

基本概念

全文索引是为了解决需要基于相似度的查询，而不是精确数值比较。

虽然使用

like + %

也可以实现模糊匹配，但是对于大量的文本数据检索，是不可想象的。全文索引在大量的数据面前，能比

like

快 N 倍，速度不是一个数量级。

版本支持

MySQL 5.6

以前的版本，只有

MyISAM

存储引擎支持全文索引

MySQL 5.6

及以后的版本，

MyISAM

和

InnoDB

存储引擎均支持全文索引

MySQL 5.7.6

中，提供了支持中文、日文和韩文（CJK）的内置全文

ngram 解析器

，以及用于日文的可安装

MeCab

全文解析器插件

全文索引只能用于

InnoDB

或

MyISAM

表，只能为

CHAR

、

VARCHAR

、

TEXT

列创建

对于大型数据集，<span style="" quot="quot" dashed="""dashed""" yellow="""yellow""">将数据加载到没有全文索引的表中然后创建索引要比将数据加载到具有现有全文索引的表中快得多

RDS MySQL 5.6

虽然也支持中文全文检索，但存在BUG

限制与缺点

导致磁盘资源的大量占用。全文索引本身就是一个利用磁盘空间换取性能的方法。全文索引大的原因是，按照某种语言来进行分词

全文索引创建速度慢，而且对有全文索引的各种数据修改操作也慢

使用全文索引并不是对应用透明的。如果要想利用全文索引，必须修改查询语句。原有的查询语句是不可能利用全文索引的，需要改成全文索引规定的语法

不区分大小写

分区表不支持全文搜索

由多列组合而成的全文检索的索引必须使用相同的字符集与排序规则

全文索引可能存在精度问题，即全文索引找到的数据，可能和

like

MATCH()函数中的列必须与FULLTEXT索引中定义的列完全一致，除非是在MyISAM表中使用IN BOOLEAN MODE模式的全文搜索（可在没有建立索引的列执行搜索，但速度很慢）

单列分别建立全文索引时，多列模糊查询时不生效

不同表的全文索引不能放在一起查询，可以两个语句中加上OR

二、操作全文索引

2.1 配置最小搜索长度

我们可以通过 SQL 命令查看当前配置的最小搜索长度（分词长度）：

SHOW VARIABLES LIKE 'ft%';

Variable_name	Value
ft_boolean_syntax	+ -><()~*:""&\|
ft_max_word_len	84
ft_min_word_len	1
ft_query_expansion_limit	20
ft_stopword_file	(built-in)

全文索引的相关参数都无法进行动态修改，必须通过修改 MySQL 的配置文件来完成。修改最小搜索长度的值为 1，首先打开 MySQL 的配置文件 /etc/my.cnf，在 [mysqld] 的下面追加以下内容：

[mysqld]
innodb_ft_min_token_size = 1
# 最短的索引字符串，默认值为4
ft_min_word_len = 1

配置完后重启 MySQL 服务器，并修复或重建全文索引方可生效。
可使用下面的命令修复：

repair table test quick;

2.2 创建索引

建表时创建全文索引

CREATE TABLE fulltext_test (
  id int(11) NOT NULL AUTO_INCREMENT,
    content TEXT NOT NULL,
    tag VARCHAR(255),
    PRIMARY KEY (id),
    FULLTEXT KEY content_tag_fulltext(content, tag) WITH PARSER ngram
) ENGINE = InnoDB DEFAULT CHARSET=utf8mb4;

在已存在的表上创建全文索引

CREATE FULLTEXT INDEX content_fulltext ON fulltext_test(content) with parser ngram;

通过 SQL 语句 ALTER TABLE 创建全文索引

ALTER TABLE fulltext_test ADD FULLTEXT INDEX content_fulltext(content) with parser ngram;

2.3 删除索引

使用 DROP INDEX 删除全文索引

DROP INDEX content_fulltext ON fulltext_test;

通过 SQL 语句 ALTER TABLE 删除全文索引

ALTER TABLE fulltext_test DROP INDEX content_fulltext;

三、检索数据

3.1 自然语言的全文检索

默认情况下，或者使用 in natural language mode 修饰符时，match() 函数对文本集合执行自然语言搜索。

SELECT * FROM 表名 WHERE Match(列名1,列名2) Against (检索内容1 检索内容2);

检索内容不需要用逗号隔开！

自然语言搜索引擎将计算每一个文档对象和查询的相关度。这里，相关度是基于匹配的关键词的个数，以及关键词在文档中出现的次数。在整个索引中出现次数越少的词语，匹配时的相关度就越高。相反，非常常见的单词将不会被搜索，如果一个词语的在超过 50% 的记录中都出现了，那么自然语言的搜索将不会搜索这类词语。

3.2 布尔全文检索

在布尔搜索中，我们可以在查询中自定义某个被搜索的词语的相关性，当编写一个布尔搜索查询时，可以通过一些前缀修饰符来定制搜索。

空(也就是默认状况)，表示可选的，包含该词的顺序较高

“>” 表示出现该单词时增加相关性，查询的结果靠前

“<” 表示出现该单词时降低相关性，查询的结果靠后

"" 双引号表示短语，表示要彻底相符，不可拆字效果，类同于 like '%keyword%'

()

+aaa +(>bbb <ccc) aaa="aaa" sql="sql" select="select" from="from" test="test" where="where" match="match" against="against" in="in" boolean="boolean" mode="mode" select="select" from="from" tommy="tommy" where="where" match="match" against="against" in="in" boolean="boolean" mode="mode" select="select" from="from" tommy="tommy" where="where" match="match" against="against">李秀琴 <练习册 <不是人>是个鬼' in boolean mode);

四、测试结果

测试环境：本机4核16G Windows10，MySQL 8.0
测试数据量：

salebilldetail

表

万行，

salebill

表

万行,

customer

表

万行,

goods

表

万行。

争对测试用的SQL语句，增加了以下全文索引：

CREATE FULLTEXT INDEX billno_fulltext ON salebill(billno) WITH PARSER ngram;
CREATE FULLTEXT INDEX remarks_fulltext ON salebill(remarks) WITH PARSER ngram;
CREATE FULLTEXT INDEX remarks_fulltext ON salebilldetail(remarks) WITH PARSER ngram;
CREATE FULLTEXT INDEX goodsremarks_fulltext ON salebilldetail(goodsremarks) WITH PARSER ngram;
CREATE FULLTEXT INDEX remarks_goodsremarks_fulltext ON salebilldetail(remarks, goodsremarks) WITH PARSER ngram;
CREATE FULLTEXT INDEX custname_fulltext ON customer(custname) WITH PARSER ngram;
CREATE FULLTEXT INDEX goodsname_fulltext ON goods(goodsname) WITH PARSER ngram;
CREATE FULLTEXT INDEX goodscode_fulltext ON goods(goodscode) WITH PARSER ngram;

测试结果，总的来说很魔幻。
为什么魔幻，看下面几个语句：

test_1

-- 测试1，原始 like 查询方式，用时 0.765s
select 1 from salebilldetail d where d.tid=260434 and ((d.remarks like concat('%','葡萄','%')) or (d.goodsremarks like concat('%','葡萄','%')));

test_2

-- 测试2，使用全文索引 remarks_fulltext、goodsremarks_fulltext, 用时 0.834s
select 1 from salebilldetail d where d.tid=260434 and ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"')  in boolean mode)));

test_3

-- 测试3，使用全文索引 remarks_goodsremarks_fulltext, 用时 0.242s
select 1 from salebilldetail d where d.tid=260434 and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

test_4

-- 测试4，原始 like 查询方式，不过滤 tid ，用时 22.654s
select t from salebilldetail d where ((d.remarks like concat('%','葡萄','%')) or (d.goodsremarks like concat('%','葡萄','%')));

test_5

-- 测试5，使用全文索引 remarks_fulltext、goodsremarks_fulltext,  不过滤 tid ，用时 24.855s
select 1 from salebilldetail d where ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"')  in boolean mode)));

test_6

-- 测试6，使用全文索引 remarks_goodsremarks_fulltext, 不过滤 tid ，用时 0.213s
select 1 from salebilldetail d where ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

test_7

-- 测试7，使用全文索引 remarks_goodsremarks_fulltext, 用时 0.22s
select count(1) from salebilldetail d where d.tid=260434 and  ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

test_8

-- 测试8，使用全文索引 remarks_goodsremarks_fulltext, 不过滤 tid ，用时 0.007s
select count(1) from salebilldetail d where ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

从上面的测试语句可以看出，数据量越多，查询越简单，全文索引的效果越好。

再来看看我们的业务测试SQL:

test_9

-- 测试9
select 
    i.billid
    ,if(0,0,i.qty) as qty  
    ,if(0,0,i.goodstotal) as total          
    ,if(0,0,i.chktotal) as selfchktotal   
    ,if(0,0,i.distotal) as distotal 
    ,if(0,0,i.otherpay) as feetotal  
    ,if(0,0,ifnull(d.costtotal,0)) as costtotal  
    ,if(0,0,ifnull(d.maoli,0)) as maoli         
    ,i.billno
    ,from_unixtime(i.billdate,'%Y-%m-%d') as billdate /*单据日期*/
    ,from_unixtime(i.createdate,'%Y-%m-%d %H:%i:%s') as createdate /*制单日期*/
    ,if(i.sdate=0,'',from_unixtime(i.sdate,'%Y-%m-%d  %H:%i:%s')) as sdate /*过账日期*/
    ,from_unixtime(i.udate,'%Y-%m-%d %H:%i:%s') as udate /*最后修改时间*/
    ,i.custid ,c.custname
    ,i.storeid ,k.storename
    ,i.empid ,e.empname
    ,i.userid ,u.username
    ,i.remarks                               /*单据备注*/
    ,i.effect,i.settle,i.redold,i.rednew     /*单据状态*/
    ,i.printtimes /* 打印次数 */
    ,(case  when i.rednew=1 then 1  when i.redold=1 then 2  when i.settle=1 then 3  when i.effect=1 then 4  else 9 end) as state /*单据状态*/
    ,(case  when i.rednew=1 then '红冲单'  when i.redold=1 then '已红冲'  when i.settle=1 then '已结算'  when i.effect=1 then '已过账'  else '草稿' end) as statetext
    ,'' as susername /* 操作人 */
    ,'' as accname /* 科目 */
from salebill i
left join coursecentersale d on d.tid=i.tid and d.billid=i.billid
left join customer c on c.tid=i.tid and c.custid=i.custid
left join store k on k.tid=i.tid and k.storeid=i.storeid
left join employee e on e.tid=i.tid and e.empid=i.empid
left join user u on u.tid=i.tid and u.userid=i.userid
where i.tid=260434 and (i.billtype = 5 or i.effect = 1)
    and ('_billdate_f_'!='')
    and ('_billdate_t_'!='')
    and ('_sdate_f_'!='')
    and ('_sdate_t_'!='')
    and ('_udate_f_'!='')
    and ('_udate_t_'!='')
    and ('_cdate_f_'!='')
    and ('_cdate_t_'!='')
    and ('_billid_'!='')      /*单据id*/
    and ('_custid_'!='')      /*客户ID*/
    and ('_storeid_'!='')     /*店仓ID*/
    and ('_empid_'!='')       /*业务员ID*/
    and ('_custstop_'!='')       /*客户是否停用*/
    and (
        (i.billno like concat('%','葡萄','%'))
        or (i.remarks like concat('%','葡萄','%'))
        or exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((d.remarks like concat('%','葡萄','%')) or (d.goodsremarks like concat('%','葡萄','%'))))
        or exists(select 1 from customer c where c.tid=260434 and c.custid=i.custid and (c.custname like concat('%','葡萄','%')))
        or exists(select 1 from goods g join salebilldetail d on d.tid=g.tid and d.goodsid=g.goodsid where d.tid=260434 and d.billid=i.billid and ((g.goodsname like concat('%','葡萄','%')) or (g.goodscode like concat('%','葡萄','%'))))
    )
    and i.rednew=0 /*单据列表不含红冲单*/ 
    and i.billid not in (select billid from coursecenter_del t where t.tid=260434)
    and ((i.settle=1 and i.effect=1 and i.redold=0 and i.rednew=0)) /*已结算*/
order by udate desc,billno desc
limit 0,100;

执行时间约

1.6

秒，使用的是

like

方式。

改成使用全文索引方式：

test_10

-- 测试10
select 
    i.billid
    ,if(0,0,i.qty) as qty         
    ,if(0,0,i.goodstotal) as total   
    ,if(0,0,i.chktotal) as selfchktotal  
    ,if(0,0,i.distotal) as distotal 
    ,if(0,0,i.otherpay) as feetotal  
    ,if(0,0,ifnull(d.costtotal,0)) as costtotal 
    ,if(0,0,ifnull(d.maoli,0)) as maoli  
    ,i.billno
    ,from_unixtime(i.billdate,'%Y-%m-%d') as billdate /*单据日期*/
    ,from_unixtime(i.createdate,'%Y-%m-%d %H:%i:%s') as createdate /*制单日期*/
    ,if(i.sdate=0,'',from_unixtime(i.sdate,'%Y-%m-%d  %H:%i:%s')) as sdate /*过账日期*/
    ,from_unixtime(i.udate,'%Y-%m-%d %H:%i:%s') as udate /*最后修改时间*/
    ,i.custid ,c.custname
    ,i.storeid ,k.storename
    ,i.empid ,e.empname
    ,i.userid ,u.username
    ,i.remarks                               /*单据备注*/
    ,i.effect,i.settle,i.redold,i.rednew     /*单据状态*/
    ,i.printtimes /* 打印次数 */
    ,(case  when i.rednew=1 then 1  when i.redold=1 then 2  when i.settle=1 then 3  when i.effect=1 then 4  else 9 end) as state /*单据状态*/
    ,(case  when i.rednew=1 then '红冲单'  when i.redold=1 then '已红冲'  when i.settle=1 then '已结算'  when i.effect=1 then '已过账'  else '草稿' end) as statetext
    ,'' as susername /* 操作人 */
    ,'' as accname /* 科目 */
from salebill i
left join coursecentersale d on d.tid=i.tid and d.billid=i.billid
left join customer c on c.tid=i.tid and c.custid=i.custid
left join store k on k.tid=i.tid and k.storeid=i.storeid
left join employee e on e.tid=i.tid and e.empid=i.empid
left join user u on u.tid=i.tid and u.userid=i.userid
where i.tid=260434 and (i.billtype = 5 or i.effect = 1)
    and ('_billdate_f_'!='')
    and ('_billdate_t_'!='')
    and ('_sdate_f_'!='')
    and ('_sdate_t_'!='')
    and ('_udate_f_'!='')
    and ('_udate_t_'!='')
    and ('_cdate_f_'!='')
    and ('_cdate_t_'!='')
    and ('_billid_'!='')      /*单据id*/
    and ('_custid_'!='')      /*客户ID*/
    and ('_storeid_'!='')     /*店仓ID*/
    and ('_empid_'!='')       /*业务员ID*/
    and ('_custstop_'!='')       /*客户是否停用*/
    and (
        (match(i.billno) against(concat('"','葡萄','"') in boolean mode))
        or (match(i.remarks) against(concat('"','葡萄','"') in boolean mode))
        or exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"')  in boolean mode))))
        or exists(select 1 from customer c where c.tid=260434 and c.custid=i.custid and (match(c.custname) Against(concat('"','葡萄','"') in boolean mode)))
        or exists(select 1 from goods g join salebilldetail d on d.tid=g.tid and d.goodsid=g.goodsid where d.tid=260434 and d.billid=i.billid 
     and ((match(g.goodsname) Against(concat('"','葡萄','"') in boolean mode))
     or (match(g.goodscode) Against(concat('"','葡萄','"') in boolean mode))))
    )
    and i.rednew=0 /*单据列表不含红冲单*/ 
    and i.billid not in (select billid from coursecenter_del t where t.tid=260434)
    and ((i.settle=1 and i.effect=1 and i.redold=0 and i.rednew=0)) /*已结算*/
order by udate desc,billno desc
limit 0,100;

执行时间约

1.6

秒，与使用的是

like

方式差不多。

最魔幻的地方来了，如果将上面的SQL语句中（

salebilldetail

表使用全文索引

remarks_fulltext

、

goodsremarks_fulltext

的地方）

exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"')  in boolean mode))))

test_11

改成使用全文索引

remarks_goodsremarks_fulltext

-- 测试11
exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode))))

执行时间无限长（跑了半天没成功）？
经分析，在

where

子句中，一个条件子句中包含一个以上

match

时会出现这样的情况。即：

-- and 中只有一个全文检索时正常， 用时0.2秒
select xxx from xxx
...
and (
    exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode))))
)
...

-- 下面这样就异常了，会慢成百上千倍，用时 160 秒， 如果有更多的 match ，会更夸张的慢下去
select xxx from xxx
...
and (
    exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode))))
    or match(i.billno) against(concat('"','葡萄','"') in boolean mode)
)
...

测试结果汇总：

查询	用时(秒)	备注
test 1	0.765	原始 like 查询
test 2	0.834	全文索引 remarks_fulltext 、 goodsremarks_fulltext
test 3	0.242	全文索引 remarks_goodsremarks_fulltext
---
test 4	22.654	原始 like 查询，不过滤 tid
test 5	24.855	全文索引 remarks_fulltext 、 goodsremarks_fulltext , 不过滤 tid
test 6	0.213	全文索引 remarks_goodsremarks_fulltext , 不过滤 tid
---
test 7	0.22	全文索引 remarks_goodsremarks_fulltext , count
test 8	0.007	全文索引 remarks_goodsremarks_fulltext , 不过滤 tid, count
---
test 9	1.6	业务测试SQL，原始 like 查询
test 10	1.6	业务测试SQL，全文索引 remarks_fulltext 、 goodsremarks_fulltext
test 11	失败	业务测试SQL，全文索引 remarks_goodsremarks_fulltext

五、MySQL 版本升级

因线上系统目前是 RDS MySQL 5.6，故简单描述升级相关问题。

Group By: 在 MySQL 5.7 之后，默认使用增加了限制，一些在 MySQL 5.6 可执行的Group By语句，在 5.7 之后会报错，可以更改新版本 MySQL 的 sqlModel

-- 查询 sql_mode
select @@SESSION.sql_mode;
-- 设置
SET GLOBAL sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';
-- 或 设置 （修改于当前会 话，关闭当前会话后失效）
SET SESSION sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';
-- 刷新
flush PRIVILEGES;

ONLY_FULL_GROUP_BY: 对于

GROUP BY

SELECT

GROUP BY

GROUP BY

NO_AUTO_VALUE_ON_ZERO: 该值影响自增长列的插入。默认设置下，插入

NULL

STRICT_TRANS_TABLES：在该模式下，如果一个值不能插入到一个事务中，则中断当前的操作，对非事务表不做限制

NO_ZERO_IN_DATE：在严格模式下，不允许日期和月份为零

NO_ZERO_DATE：设置该值，mysql数据库不允许插入零日期，插入零日期会抛出错误而不是警告

ERROR_FOR_DIVISION_BY_ZERO：在

insert

update

NULL

NO_AUTO_CREATE_USER: 禁止

GRANT

NO_ENGINE_SUBSTITUTION：如果需要的存储引擎被禁用或未编译，那么抛出错误。不设置此值时，用默认的存储引擎替代，并抛出一个异常

PIPES_AS_CONCAT：将"

||

ANSI_QUOTES：启用后，不能用双引号来引用字符串，因为它被解释为识别符

方式2：在配置文件中添加

sql_mode = '对应需要的模式'

sql_mode 模式说明：

方式1：重启 MySQL 后失效

MySQL8.0 修改了账号密码加密策略 (默认的认证插件由

mysql_native_password

caching_sha2_password

[mysqld]
default_authentication_plugin = mysql_native_password

-- 修改加密规则 
ALTER USER 'root'@'localhost' IDENTIFIED BY 'password' PASSWORD EXPIRE NEVER; 
-- 更新用户密码
ALTER USER '账号'@'%' IDENTIFIED WITH mysql_native_password BY '密码';
-- 刷新权限
FLUSH PRIVILEGES;

方式2：执行语句修改某账号密码验证策略

方式1：配置文件中添加, 让mysql使用原密码策略 (需重启mysql服务)

MySQL8.0 授权用户账号语法变更，创建用户的操作已经不支持

grant

-- 原来的流程：
mysql> grant all on *.* to 'admin'@'%' identified by 'admin';
-- 新的正确流程：
mysql> create user 'admin'@'%' identified by 'admin';
mysql> grant all on *.* to 'admin'@'%' ;
mysql> flush privileges;

数据库连接区别

jdbc:mysql://{ip}:{port}/{db}&#63;characterEncoding=utf8&useSSL=false&serverTimezone=UTC
// useSSL  如果不配置false 项目可以正常启动但是会提示ssl问题
// serverTimezone=UTC 必须配置【时区设置成自己对应的时区】否则项目会报错

show variables like '%time_zone%'；
set global time_zone='+8:00';

如果时区问题还不能解决：

JDBC 连接串修改如下（首先需要驱动使用8.0对应连接的驱动）：

MySQL 5.7 原生支持JSON类型，并引入了众多JSON函数

MySQL 8.0 JSON字段的部分更新（JSON Partial Updates）

MySQL 8.0 默认字符集由latin1修改为utf8mb4

MySQL 8.0 正则表达式的增强，新增了4个相关函数，

REGEXP_INSTR()

REGEXP_LIKE()

REGEXP_REPLACE()

REGEXP_SUBSTR()

MySQL 8.0 GROUP BY语句不再隐式排序（忽略在Group By中的排序命令，如 desc, asc）

日历

标签

搜索

最新文章

热门文章

mysql ft指的是什么

热门推荐

日历

标签

搜索

最新文章

热门文章