大數(shù)組元素差異removeAll與Map效率對比

更新時間：2023年03月09日 15:04:39 作者：變速風聲

這篇文章主要介紹了大數(shù)組元素差異removeAll與Map效率對比,有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進步，早日升職加薪

正文

考慮這樣一個場景，對兩個列表對象，listA 和 listB，比較二者差異，找出只在 listA 中出現(xiàn)的元素列表 onlyListA，找出只在 listB 中出現(xiàn)的元素列表 onlyListB。

removeAll實現(xiàn)

很容易想到借助 removeAll 實現(xiàn)，代碼如下。

List<String> listA = new ArrayList<>();
List<String> listB = new ArrayList<>();
//僅在數(shù)組A中出現(xiàn)的元素
List<String> onlyListA = new ArrayList<>(listA);
onlyListA.removeAll(listB);
//僅在數(shù)組B中出現(xiàn)的元素
List<String> onlyListB = new ArrayList<>(listB);
onlyListB.removeAll(listA);

當數(shù)組元素較少時，借助 removeAll 實現(xiàn)并沒有任何問題。不過在數(shù)組元素較大時，removeAll 方法耗時會較大。執(zhí)行如下測試方法，對數(shù)組元素個數(shù)為1000，1W，10W，100W 的場景進行測試。

public class ListDiffTest {
    public static void main(String[] args) {
        testRemoveAllCostTime(1000);
        testRemoveAllCostTime(10000);
        testRemoveAllCostTime(100000);
        testRemoveAllCostTime(1000000);
    }
    public static void testRemoveAllCostTime(int size) {
        List<String> listA = dataList(size);
        listA.add("onlyAElement");
        List<String> listB = dataList(size + 3);
        long startTime = System.currentTimeMillis();
        //僅在數(shù)組A中出現(xiàn)的元素
        List<String> onlyListA = new ArrayList<>(listA);
        onlyListA.removeAll(listB);
        //僅在數(shù)組B中出現(xiàn)的元素
        List<String> onlyListB = new ArrayList<>(listB);
        onlyListB.removeAll(listA);
        System.out.println("僅在集合A中出現(xiàn)的元素：" + onlyListA);
        System.out.println("僅在集合B中出現(xiàn)的元素：" + onlyListB);
        System.out.println("元素個數(shù) = " + size + "時，比對耗時：" +  (System.currentTimeMillis() - startTime) + " 毫秒");
    }
    private static List<String> dataList(int size) {
        List<String> dataList = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            dataList.add("" + i);
        }
        return dataList;
    }
}

測試結(jié)果如下

僅在集合A中出現(xiàn)的元素：[onlyAElement]
僅在集合B中出現(xiàn)的元素：[1000, 1001, 1002]
元素個數(shù) = 1000時，比對耗時：19 毫秒
元素個數(shù) = 10000時，比對耗時：299 毫秒 #1W
元素個數(shù) = 100000時，比對耗時：24848 毫秒 #10W
元素個數(shù) = 1000000時，比對耗時：3607607 毫秒 #100W 約60m

可以看到，當數(shù)組元素達到百萬級時，耗時將達60min上下。

借助Map實現(xiàn)

此處給出一種優(yōu)化方式，借助 Map 計數(shù)，將 List 集合中的元素作為 Map 的 key，元素出現(xiàn)的次數(shù)作為 Map 的 value。代碼實現(xiàn)如下。

import io.vavr.Tuple2;
public class ListDiffTest {
    public static void main(String[] args) {
        testDifferListByMapCostTime(1000);
        testDifferListByMapCostTime(10000);
        testDifferListByMapCostTime(100000);
        testDifferListByMapCostTime(1000000);
    }
    public static void testDifferListByMapCostTime(int size) {
        List<String> listA = dataList(size);
        listA.add("onlyAElement");
        List<String> listB = dataList(size + 3);
        long startTime = System.currentTimeMillis();
        //僅在數(shù)組A中出現(xiàn)的元素
        List<String> onlyListA = tuple2._1;;
        //僅在數(shù)組B中出現(xiàn)的元素
        List<String> onlyListB = tuple2._2;
        System.out.println("僅在集合A中出現(xiàn)的元素：" + onlyListA);
        System.out.println("僅在集合B中出現(xiàn)的元素：" + onlyListB);
        System.out.println("元素個數(shù) = " + size + "時，比對耗時：" +  (System.currentTimeMillis() - startTime) + " 毫秒"); 
    }
    /**
     * 通過Map計數(shù)方式 比較兩個數(shù)組之間的差異
     *
     * @param listA 數(shù)組A
     * @param listB 數(shù)組B
     * @param <E> 元素類型
     * @return Tuple2對象 onlyAList-只在數(shù)組A存在的元素  onlyBList-只在數(shù)組B存在的元素
     */
    public static <E> Tuple2<List<E>, List<E>> getDiffListBtMapCompare(List<E> listA, List<E> listB) {
        ValidateUtils.validateNotNull(listA, "listA");
        ValidateUtils.validateNotNull(listB, "listB");
        List<E> onlyAList = new ArrayList<>();
        List<E> onlyBList = new ArrayList<>();
        if (CollectionUtils.isEmpty(listA)) {
            return Tuple.of(onlyAList, listB);
        } else if (CollectionUtils.isEmpty(listB)) {
            return Tuple.of(listA, onlyBList);
        }
        /**
         * listA中元素 初始化計數(shù) = 1
         * listB中元素 初始化計數(shù) = -2
         * 遍歷累加后
         * 相同元素 計數(shù) = 2
         * 僅A中出現(xiàn)元素  計數(shù) = 1
         * 僅A中出現(xiàn)元素  計數(shù) = -1
         */
        Map<E, Integer> countMap = new HashMap<>(Math.max(listA.size(), listB.size()));
        for (E eleA : listA) {
            countMap.put(eleA, 1);
        }
        for (E eleB : listB) {
            countMap.put(eleB, 1 + countMap.getOrDefault(eleB, -2));
        }
        countMap.forEach((k, v) -> {
            //獲取不同元素集合
            if (v == 1) {
                onlyAList.add(k);
            } else if (v == -1) {
                onlyBList.add(k);
            }
        });
        return Tuple.of(onlyAList, onlyBList);
    }
}

測試結(jié)果如下

僅在集合A中出現(xiàn)的元素：[onlyAElement]
僅在集合B中出現(xiàn)的元素：[1000, 1002, 1001]
元素個數(shù) = 1000時，比對耗時：8 毫秒
元素個數(shù) = 10000時，比對耗時：19 毫秒 #1W
元素個數(shù) = 100000時，比對耗時：28 毫秒 #10W
元素個數(shù) = 1000000時，比對耗時：96 毫秒 #100W
元素個數(shù) = 10000000時，比對耗時：5320 毫秒 #1000W

removeAll耗時分析

最后，來分析下為什么在大數(shù)組元素比較時，removeAll 性能較差。

removeAll 方法中，先進行判空，然后調(diào)用 batchRemove() 方法

    public boolean removeAll(Collection<?> c) {
        Objects.requireNonNull(c);
        return batchRemove(c, false);
    }

batchRemove() 方法中，使用 for 循環(huán)對集合進行遍歷。第 1 層循環(huán)需要執(zhí)行 listA.size() 次。循環(huán)體中調(diào)用了 contains() 方法來確定集合 B 是否含有該元素。

    private boolean batchRemove(Collection<?> c, boolean complement) {
        final Object[] elementData = this.elementData;
        int r = 0, w = 0;
        boolean modified = false;
        try {
            for (; r < size; r++)
                if (c.contains(elementData[r]) == complement)
                    elementData[w++] = elementData[r];
        } finally {
            // Preserve behavioral compatibility with AbstractCollection,
            // even if c.contains() throws.
            if (r != size) {
                System.arraycopy(elementData, r,
                                 elementData, w,
                                 size - r);
                w += size - r;
            }
            if (w != size) {
                // clear to let GC do its work
                for (int i = w; i < size; i++)
                    elementData[i] = null;
                modCount += size - w;
                size = w;
                modified = true;
            }
        }
        return modified;
    }

contains() 方法的實現(xiàn)如下，內(nèi)部又調(diào)用了 indexOf() 方法。indexOf() 方法內(nèi)部又進行了一層 for 循環(huán)遍歷。

    public boolean contains(Object o) {
        return indexOf(o) >= 0;
    }
    public int indexOf(Object o) {
        if (o == null) {
            for (int i = 0; i < size; i++)
                if (elementData[i]==null)
                    return i;
        } else {
            for (int i = 0; i < size; i++)
                if (o.equals(elementData[i]))
                    return i;
        }
        return -1;
    }

至此，可以看到，按照平均每次遍歷要進行 list.size() / 2 次計算，假設(shè)集合 A 的元素個數(shù)為 m，集合 B 的元素個數(shù)為 n，則兩重 for 循環(huán)下，會執(zhí)行 m*n/2次。對于兩個千萬量級的數(shù)組，將執(zhí)行 100 億次計算！?。?/li>

由此給出一個結(jié)論，對于大數(shù)組元素差異比較，不建議使用 removeAll，可以借助 Map 實現(xiàn)。

參考 http://www.dbjr.com.cn/article/261737.htm

以上就是大數(shù)組元素差異removeAll與Map效率對比的詳細內(nèi)容，更多關(guān)于removeAll Map效率對比的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: