Linux下刪除大數(shù)據(jù)文件中部分字段重復(fù)行的方法

發(fā)布時間：2014-04-24 11:48:00 作者：佚名

找來找去linux下也沒找到合適的工具，sed/gawk等流處理工具只能針對一行一行處理，并無法找到字段重復(fù)的行?？磥碇缓米约簆ython一個程序了，突然想起來利用mysql，于是進(jìn)行乾坤大挪移

最近寫的一個數(shù)據(jù)采集程序生成了一個含有1千多萬行數(shù)據(jù)的文件，數(shù)據(jù)由4個字段組成，按照要求需要刪除第二個字段重復(fù)的行，找來找去linux下也沒找到合適的工具，sed/gawk等流處理工具只能針對一行一行處理，并無法找到字段重復(fù)的行?？磥碇缓米约簆ython一個程序了，突然想起來利用mysql，于是進(jìn)行乾坤大挪移：

1. 利用mysqlimport --local dbname data.txt導(dǎo)入數(shù)據(jù)到表中，表名要與文件名一致
2. 執(zhí)行下列sql語句(要求唯一的字段為uniqfield）

復(fù)制代碼

代碼如下:

use dbname;
alter table tablename add rowid int auto_increment not null;
create table t select min(rowid) as rowid from tablename group by uniqfield;
create table t2 select tablename .* from tablename,t where tablename.rowid= t.rowid;
drop table tablename;
rename table t2 to tablename;

Tag：大數(shù)據(jù) 重復(fù)行

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

Linux下刪除大數(shù)據(jù)文件中部分字段重復(fù)行的方法

相關(guān)文章

最新評論

文章分類

大家感興趣的內(nèi)容

最近更新的內(nèi)容