使用Java實現(xiàn)查找并移除字符串中的Emoji
一、基礎(chǔ)知識
- Emoji 實際上是 UTF-8 (Unicode) 字符集上的特殊字符,多數(shù)基本 Emoji 都被分配到 Unicode 編碼表 1 號平面的 U+1F300–1F6FF 和 U+1F900–1FAFF 兩個區(qū)域,由2個字符組成。
- 膚色修飾:大多數(shù)與人相關(guān)的 Emoji 默認是黃色的,所以后來引入了五個新碼點作為修飾符:
U+1F3FB、U+1F3FC、U+1F3FD、U+1F3FE、U+1F3FF。膚色修飾符追加到現(xiàn)有的 Emoji 后形成新的樣式:U+1F44B(?? ) +U+1F3FD= ???? - 符號變體或組合:一個普通的字后連接一個或多個變體、組合標識(字符),組合形成的 Emoji :
U+25C0+U+FE0F= ??U+27A1+U+FE0F= ??1+U+FE0F+U+20E3= 1?? - 國旗:每個國旗由2個地區(qū)標識符組合而成,地區(qū)標識符的對應(yīng)碼點范圍為
U+1F1E6~U+1F1FF,等同于2個指定范圍的普通 Emoji 字符組成。U+1F1E8+U+1F1F3= ???? - 零寬度連接符(ZWJ):多個基礎(chǔ) Emoji 通過零寬度連接符(
U+200D)形成的復(fù)雜 Emoji: ??+U+200D+??= ???? ??+U+200D+??+U+200D+??= ?????? ??+U+200D+??+U+200D+??+U+200D+??= ???????? - 序列:一個基礎(chǔ) Emoji 加上多個標簽字符 (
U+E0020~U+E007F)并以 Tag Cancel(U+E007)結(jié)尾,組合形成一個復(fù)雜 Emoji:U+1F3F4(??) +U+E0067+U+E0062+U+E0065+U+E006E+U+E0067+U+E007F= ?????????????? - 特殊符號: 特殊符號只有1個字符,有些符號在某些環(huán)境下會被當(dāng)做Emoj處理:?、?、?;
Unicode 只是約定了碼點到 emoji 的映射關(guān)系,并沒有約定 Emoji 圖形,每個 Emoji 字體文件可以按照自己的想法設(shè)計 Emoji。
二、解決方案
除了一些特殊符號形式的 Emoji,其他Emoji至少有2個字符,所以先根據(jù)第二個字符類型判斷是否為Emoji,使用Character.UnicodeBlock.of和Character.getType方法判定每個字符的類型。
通過第二個字符類型判斷當(dāng)前2個字符為 Emoji 后: 1)判斷是否有后續(xù)修飾 2)判斷處理國旗類型;判斷處理膚色修飾;判斷處理 Emoji 序列標簽;判斷處理零寬度連接符;判斷處理連續(xù)變體、組合標識;按照普通 Emoji 處理;
處理單字符的特殊符號,這一類型內(nèi)有的屬于 Emoji,有的不是,目前全部簡單的按照普通 Emoji 處理;
三、完整代碼
package com.zpf.tool;
import java.util.List;
public class EmojiUtil {
public static boolean isEmojiNationalFlag(int codePoint) {
return codePoint >= 127462 && codePoint <= 127487;
}
// String str = new String(new int[]{0x1F44B, 0x1F3FD}, 0, 2);
public static boolean isEmojiSkinColor(int codePoint) {
return codePoint >= 127995 && codePoint <= 127999;
}
// String str = new String(new int[]{0x1F3F4, 0xE0067, 0xE0062, 0xE0065, 0xE006E, 0xE0067, 0xE007F}, 0, 7);
public static boolean isEmojiTagEnd(int codePoint) {
return codePoint == 917631;
}
public static boolean isEmojiTagSpec(int codePoint) {
return codePoint >= 917536 && codePoint <= 917630;
}
public static boolean isEmojiDecorateBlock(Character.UnicodeBlock block) {
if (block == null) {
return false;
}
return block.equals(Character.UnicodeBlock.VARIATION_SELECTORS)
|| block.equals(Character.UnicodeBlock.VARIATION_SELECTORS_SUPPLEMENT)
|| block.equals(Character.UnicodeBlock.COMBINING_HALF_MARKS)
|| block.equals(Character.UnicodeBlock.COMBINING_MARKS_FOR_SYMBOLS)
|| block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS)
|| block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS_SUPPLEMENT);
}
public static void pickAllEmoji(CharSequence data, StringBuilder removeResult, List<String> emojiList) {
if (removeResult == null && emojiList == null) {
return;
}
if (removeResult != null) {
removeResult.delete(0, removeResult.length());
}
if (emojiList != null) {
emojiList.clear();
}
if (data == null || data.length() == 0) {
return;
}
StringBuilder emojiBuilder = new StringBuilder();
int i = 0;
int j;
Character.UnicodeBlock block;
while (i < data.length()) {
if (i + 1 < data.length()) {
block = Character.UnicodeBlock.of(data.charAt(i + 1));
if (isEmojiDecorateBlock(block) || Character.UnicodeBlock.LOW_SURROGATES.equals(block)) {
if (i + 2 >= data.length()) {
emojiBuilder.append(data, i, i + 2);
break;
}
j = handleNationalFlag(data, i, emojiBuilder, emojiList);
if (i != j) {
i = j;
continue;
}
j = handleHumanSkin(data, i, emojiBuilder, emojiList);
if (i != j) {
i = j;
continue;
}
j = handleTagSequence(data, i, emojiBuilder, emojiList);
if (i != j) {
i = j;
continue;
}
emojiBuilder.append(data, i, i + 2);
i = handleNextChar(data, i + 2, emojiBuilder, emojiList);
continue;
}
}
recordEmoji(emojiBuilder, emojiList);
int type = Character.getType(data.charAt(i));
if (type == (int) Character.OTHER_SYMBOL) {//特殊符號一律按照Emoji處理
if (emojiList != null) {
emojiList.add(String.valueOf(data.charAt(i)));
}
} else if (removeResult != null) {
removeResult.append(data.charAt(i));
}
i++;
}
recordEmoji(emojiBuilder, emojiList);
}
private static int handleNextChar(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
if (i >= data.length()) {
return i;
}
char nextChar = data.charAt(i);
if (nextChar == '\u200D') {//零寬度連接符
emojiBuilder.append(nextChar);
return i + 1;
}
int j = i;
Character.UnicodeBlock block;
while (j < data.length()) {
nextChar = data.charAt(j);
block = Character.UnicodeBlock.of(nextChar);
if (isEmojiDecorateBlock(block)) {
emojiBuilder.append(nextChar);
j++;
} else {
break;
}
}
if (i != j) {
recordEmoji(emojiBuilder, emojiList);
}
return j;
}
private static int handleNationalFlag(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
int codePoint = Character.codePointAt(data, i);
if (isEmojiNationalFlag(codePoint)) {//處理國旗類型
recordEmoji(emojiBuilder, emojiList);//提交未處理
if (i + 3 < data.length()) {
codePoint = Character.codePointAt(data, i + 2);
if (isEmojiNationalFlag(codePoint)) {
emojiBuilder.append(data, i, i + 4);
recordEmoji(emojiBuilder, emojiList);
i = i + 4;
}
}
i = i + 2;
}
return i;
}
private static int handleHumanSkin(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
if (i + 3 >= data.length()) {
return i;
}
int codePoint = Character.codePointAt(data, i + 2);
if (isEmojiSkinColor(codePoint)) {//膚色修飾
emojiBuilder.append(data, i, i + 4);
recordEmoji(emojiBuilder, emojiList);
i = i + 4;
}
return i;
}
private static int handleTagSequence(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
if (i + 3 >= data.length()) {
return i;
}
int codePoint = Character.codePointAt(data, i + 2);
if (isEmojiTagSpec(codePoint)) {
emojiBuilder.append(data, i, i + 4);
i = i + 4;
while (i < data.length()) {
codePoint = Character.codePointAt(data, i);
if (isEmojiTagSpec(codePoint)) {
emojiBuilder.append(data, i, i + 2);
i = i + 2;
} else if (isEmojiTagEnd(codePoint)) {
emojiBuilder.append(data, i, i + 2);
recordEmoji(emojiBuilder, emojiList);
i = i + 2;
break;
} else { //error
break;
}
}
emojiBuilder.delete(0, emojiBuilder.length());
} else if (isEmojiTagEnd(codePoint)) {
emojiBuilder.append(data, i, i + 4);
recordEmoji(emojiBuilder, emojiList);
i = i + 4;
}
return i;
}
private static void recordEmoji(StringBuilder builder, List<String> emojiList) {
if (builder != null && builder.length() > 0) {
if (emojiList != null) {
emojiList.add(builder.toString());
}
builder.delete(0, builder.length());
}
}
}
以上就是使用Java實現(xiàn)查找并移除字符串中的Emoji的詳細內(nèi)容,更多關(guān)于Java查找并移除字符串中Emoji的資料請關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
SpringBoot實現(xiàn)異步事件驅(qū)動的方法
本文主要介紹了SpringBoot實現(xiàn)異步事件驅(qū)動的方法,文中通過示例代碼介紹的非常詳細,對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2021-06-06
mybatis中Oracle參數(shù)為NULL錯誤問題及解決
這篇文章主要介紹了mybatis中Oracle參數(shù)為NULL錯誤問題及解決,具有很好的參考價值,希望對大家有所幫助。如有錯誤或未考慮完全的地方,望不吝賜教2022-12-12
Shiro實現(xiàn)session限制登錄數(shù)量踢人下線功能
這篇文章主要介紹了Shiro實現(xiàn)session限制登錄數(shù)量踢人下線,本文記錄的是shiro采用session作為登錄方案時,對用戶進行限制數(shù)量登錄,以及剔除下線,需要的朋友可以參考下2023-11-11
詳解Spring Cloud Consul 實現(xiàn)服務(wù)注冊和發(fā)現(xiàn)
這篇文章主要介紹了Spring Cloud Consul 實現(xiàn)服務(wù)注冊和發(fā)現(xiàn),小編覺得挺不錯的,現(xiàn)在分享給大家,也給大家做個參考。一起跟隨小編過來看看吧2018-03-03
Android?Studio中創(chuàng)建java工程的完整步驟
Android?Studio創(chuàng)建java工程是非常麻煩的,因為Android?Studio沒有提供直接創(chuàng)建java工程的方法,下面這篇文章主要給大家介紹了關(guān)于Android?Studio中創(chuàng)建java工程的完整步驟,需要的朋友可以參考下2024-01-01

