文件编码批量转换为utf

2023-03-16 09:33| 来源: 网络整理| 查看: 265

更新：2020-03-19

最新在win10平台用QT开发的时候发现UTF-8编码分UTF-8 和 UTF-8 BOM两种。UTF-8文件中放置BOM主要是微软的习惯，但是放在别的系统上会出现问题，不含BOM的UTF-8才是标准形式。因为经常要跨平台移植代码，所以windows下写的代码使用utf-8保存，会自动带上bom，移植到ubuntu下中文部分可能会乱码。所以把编码转换工具扩展了下功能。

新增功能：

支持批量转换任意格式的文件编码；可将文件编码转为UTF-8 BOM 、UTF-8、GB2312中的任意一种格式；

github下载地址：

https://github.com/clorymmk/CodeTransmit

新版程序运行效果图：

/**********************************以下内容为旧版程序*********************************************/

由于本人习惯使用 MDK + VScode 进行单片机程序开发，有时候会遇到一个这样的问题，MDK中默认的文件编码是GB2312或ASCII，而VScode中默认的文件编码是utf-8，其他同事有的是用MDK开发，写的中文注释就会变成GB2312编码，而我这边使用utf-8编码打开代码就会出现乱码现象，每次都要重新转换文件编码是件麻烦的事，因此写了个python脚本，批量转换工程文件编码为utf-8。

python编码转换脚本：

# 编码转换工具，将路径下所有".c .h .cpp .hpp .java"文件都转为 utf-8 格式 #i!/usr/bin/env python3 # -*- coding:utf-8 -*- import os import sys import codecs import chardet gErrArray = [] def convert(fileName, filePath,out_enc="utf-8"): try: content=codecs.open(filePath,'rb').read() source_encoding=chardet.detect(content)['encoding'] # print ("fileName:%s \tfileEncoding:%s" %(fileName, source_encoding)) print("{0:50}{1}".format(fileName, source_encoding)) if source_encoding != None: if source_encoding == out_enc: return content=content.decode(source_encoding).encode(out_enc) codecs.open(filePath,'wb').write(content) else : gErrArray.append("can not recgonize file encoding %s" % filePath) except Exception as err: gErrArray.append("%s:%s"%(filePath, err)) def explore(dir): print("\r\n===============================================================") print("{0:50}{1}".format('fileName', 'fileEncoding')) print("===============================================================") for root,dirs,files in os.walk(dir): for file in files: suffix = os.path.splitext(file)[1] if suffix == '.h' or suffix == '.c' or suffix == '.cpp' or suffix == '.hpp' or suffix == '.bat' or suffix == '.java': path=os.path.join(root,file) convert(file, path) def main(): #explore(os.getcwd()) filePath = input("请输入要转换编码的文件夹路径: \n") explore(filePath) print('\r\n---------错误统计------------') for index,item in enumerate(gErrArray): print(item) print('\r\n 共%d个错误！'%(len(gErrArray))) if(len(gErrArray) > 0): print("出现错误时，可手动找到错误文件，用notepad++打开后，点击编码，改为utf-8保存") print('\r\n-----------------------------') if __name__=="__main__": while True: main() input("\r\n########### 按回车键继续转换!!! ###########\r\n")

该程序支持.c .h .cpp .hpp .bat .java等6种格式的文件编码转换，如果需要添加其他格式的文件，直接修改suffix的条件判断处的语句即可。

附上windows平台下的可执行程序下载链接：(压缩包中带有源码和exe程序，如需在linux平台下运行，由于linux默认自带python, pip3安装chardet库后，即可运行源码)

https://download.csdn.net/download/clorymmk/11947085

【本文地址】

公司简介

联系我们