快捷導(dǎo)航

python實(shí)現(xiàn)修改固定模式的字符串內(nèi)容操作示例

更新時(shí)間：2019年12月30日 10:44:24 作者：團(tuán)長sama

這篇文章主要介紹了python實(shí)現(xiàn)修改固定模式的字符串內(nèi)容操作,結(jié)合實(shí)例形式詳細(xì)分析了Python修改固定模式字符串原理、實(shí)現(xiàn)方法及相關(guān)操作注意事項(xiàng),需要的朋友可以參考下

說明

字符串模式是開頭可能有空格，之后可能存在多個(gè)小數(shù)點(diǎn)，然后后面跟著一個(gè)數(shù)字，數(shù)字可能是小數(shù)，數(shù)字后可能存在空格。

任務(wù)要求刪去開頭的小數(shù)點(diǎn)，如下:

" …78 " 修改為" 78 "
" …7.889 " 修改為" 7.889 "
“.9.8"修改為"9.8”

代碼示例

注意這里正則的模式和分組的用法

import os
import re
testStr=r"...7.88 "
pattern=re.compile(r'(?P<lblank> *)(?P<point>\.*)(?P<realcontent>\d+\.?\S*)(?P<rblank> *)')
finalStr=pattern.search(testStr)
print(finalStr)
result=finalStr.group("lblank")+finalStr.group("realcontent")+finalStr.group("rblank")
print("result is: {}".format(result))

輸出:

<_sre.SRE_Match object; span=(0, 8), match='...7.88 '>
result is: 7.88

拓展

說明

用來處理樣本用的。標(biāo)簽是一個(gè)txt文件包含了圖片的內(nèi)容，內(nèi)容的模式是（空格*）+（.*）+(小數(shù)或者整數(shù))+（空格湊齊位數(shù)）。

腳本實(shí)現(xiàn)功能是：將第二部分里面的小數(shù)點(diǎn)去除（用正則分組去），修正原本的標(biāo)簽文件，并將標(biāo)簽兩邊占位用的空格去掉，形成新的標(biāo)簽，將新標(biāo)簽文件和對應(yīng)的圖片移動到以標(biāo)簽長度命名的文件夾中。由于文件量有40w+，使用多進(jìn)程處理。

拓展代碼

import os
import re
from multiprocessing import Pool
import shutil
def getAllFilePath(pathFolder,filter=[".jpg",".txt"]):
  #遍歷文件夾下所有圖片
  allCropPicPathList=[]
  allTXTPathList=[]
  #maindir是當(dāng)前搜索的目錄 subdir是當(dāng)前目錄下的文件夾名 file是目錄下文件名
  for maindir,subdir,file_name_list in os.walk(pathFolder):
    for filename in file_name_list:
      apath=os.path.join(maindir,filename)
      ext=os.path.splitext(apath)[1]#返回?cái)U(kuò)展名
      if ext==filter[0] and ('_crop' in filename):
        allCropPicPathList.append(apath)
      elif ext==filter[1] and ('_crop' in filename):
        allTXTPathList.append(apath)
  return list(zip(allCropPicPathList,allTXTPathList))
#分析樣本 對模式錯誤（即刪去在開頭空格和數(shù)字之間的.）的進(jìn)行修正
def checkTxtContent(txtcontent,txtPath):
  pattern=re.compile(r'(?P<lblank> *)(?P<point>\.*)(?P<realcontent>\d+\.?\S*)(?P<rblank> *)')
  finalStr=pattern.search(txtcontent)
  if len(finalStr.group("point"))!=0:
    resultStr=finalStr.group("lblank")+finalStr.group("realcontent")+finalStr.group("rblank")
    with open(txtPath,'w') as fw:
      fw.write(resultStr)
    with open(r'E:\Numberdata\wrong.txt','a') as fw:
      fw.write(txtPath+"\n") 
    print(txtPath,"is wrong!")
    return resultStr
  else:
    return txtcontent
#移動圖片到對應(yīng)長度的文件夾 標(biāo)簽label進(jìn)行修改
def dealSampleList(samplePathList,saveBaseDir):
  for samplePath in samplePathList:
    txtPath=samplePath[1]
    picPath=samplePath[0]
    newtxtStr=""
    with open(txtPath,'r') as fr:
      txtStr=fr.readline()
      newtxtStr=checkTxtContent(txtStr,txtPath)
      newtxtStr=newtxtStr.strip()
    # 創(chuàng)建對應(yīng)的文件夾
    saveDir=os.path.join(saveBaseDir,str(len(newtxtStr)))
    if not os.path.exists(saveDir):
      os.mkdir(saveDir)
    newTxtName=os.path.basename(txtPath)
    newPicName=os.path.basename(picPath)
    with open(os.path.join(saveDir,newTxtName),'w') as fw:
      fw.write(newtxtStr) 
    shutil.move(picPath,os.path.join(saveDir,newPicName))
    # print(newPicName,'is done!')
if __name__ =='__main__':
  allFilePath=getAllFilePath(r'E:\Numberdata\4')
  # dealSampleList(allFilePath,r'E:\Numberdata\data')
  n_total=len(allFilePath)
  n_process=4 #8線程
  #每段子列表長度
  length=float(n_total)/float(n_process)
  indices=[int(round(i*length)) for i in range(n_process+1)]
  sublists=[allFilePath[indices[i]:indices[i+1]] for i in range(n_process)]
  #生成進(jìn)程池 
  p=Pool(n_process)
  for i in sublists:
    print("sublist len is {}".format(len(i)))
    p.apply_async(dealSampleList, args=(i,r'E:\Numberdata\data'))
  p.close()
  p.join()
  print("All done!")

更多關(guān)于Python相關(guān)內(nèi)容感興趣的讀者可查看本站專題：《Python字符串操作技巧匯總》、《Python數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Python列表(list)操作技巧總結(jié)》、《Python編碼操作技巧總結(jié)》、《Python函數(shù)使用技巧總結(jié)》及《Python入門與進(jìn)階經(jīng)典教程》

希望本文所述對大家Python程序設(shè)計(jì)有所幫助。

您可能感興趣的文章: