8片迁移到16片,新环境采用hash分片,分片片键等提前做好。 mongosync mongos->mongos 数据校验: count: 以一个集合为例,数据量为4.7亿,发现缺了数据20万多万(疑似mongosync bug), 数据补录: mongoexport出来所有_id,16G, 新旧对比 awk 'NR==FNR{a[$1]}NR>FNR {if(!($1 in a))print $0}' file1 file2 >diff.txt 处理一下数据,从源库按_id同步遗漏的数据 cat diff2.txt |while read id do mongosync -h ip:port -u admin -pxxxxxx -d exaitem_biginfo -c biginfo -q "{ _id:$id}" --to newIp:port --tu admin --tp 'xxxxxx' done 补完数据后,发现还差8万多,最后发现是由于mongoexport从mongos导出的数据有重复现象,已验证。 由于数据较大,shell 处理占用太多内存,导入到mysql 做group by 筛选出重复的数据,以下为验证过程。 ``` mysql> select * from t limit 10; +————————–+ | oid | +————————–+ | _id | | 50861566fced7fc9d52fcc97 | | 508617859ee06fc7b6856a0a | | 50863f37ef7ce88761f12d49 | | 5086157cfced7fc9d5303de4 | | 50861581fced7fc9d5305072 | | 508614dafced7fc9d52d2818 | | 508614c1fced7fc9d52cc41d | | 508628b6fced7fc9d54f4b51 | | 50861566fced7fc9d52fcc9d | +————————–+ 10 rows in set (0.00 sec)

mysql> create table tt as select oid,count() as n from t group by oid having count()>1; Query OK, 82509 rows affected (2 hours 30 min 27.22 sec) Records: 82509 Duplicates: 0 Warnings: 0 mysql> select * from tt limit 20; +————————–+—+ | oid | n | +————————–+—+ | 5086740fef7ce88761f51c1b | 2 | | 5086740fef7ce88761f51c1c | 2 | | 5086740fef7ce88761f51c1d | 2 | | 5086740fef7ce88761f51c1e | 2 | | 5086740fef7ce88761f51c1f | 2 | | 5086740fef7ce88761f51c20 | 2 | | 5086740fef7ce88761f51c21 | 2 | | 5086740fef7ce88761f51c22 | 2 | | 5339409694190b61c3705313 | 2 | | 5339409694190b61c3705315 | 2 | | 5339409694190b61c3705319 | 2 | | 5339409694190b61c370531a | 2 | | 5339409694190b61c370531c | 2 | | 5339409694190b61c370531d | 2 | | 5339409694190b61c3705321 | 2 | | 5339409694190b61c3705325 | 2 | | 5339409694190b61c370532b | 2 | | 5339409694190b61c370532c | 2 | | 5339409694190b61c3705337 | 2 | | 5339409694190b61c3705339 | 2 | +————————–+—+ 20 rows in set (0.00 sec) 去mongos验证: mongos> db.biginfo.count({”_id” : ObjectId(“5086740fef7ce88761f51c1b”)}) 2 mongos> db.biginfo.find({”_id” : ObjectId(“5086740fef7ce88761f51c1b”)}).count() 2 ```

0 回复
需要 登录 后方可回复, 如果你还没有账号你可以 注册 一个帐号。