浏览代码

第一次上传

kangtan 1 年之前
父节点
当前提交
3b7aa5e608
共有 100 个文件被更改,包括 258577 次插入0 次删除
  1. 32 0
      Images_rename/images_rename.py
  2. 1 0
      TextRecognitionDataGenerator
  3. 56 0
      classification_dataset_generate/divide_train_eval.py
  4. 二进制
      correct_imgs_rotation/correted_imgs/16602752667386.png
  5. 50 0
      correct_imgs_rotation/img_correct.py
  6. 二进制
      correct_imgs_rotation/imgs/16602752667386.png
  7. 二进制
      correct_imgs_rotation/result/16602752667386.png
  8. 41 0
      delete_anno_by_label/delete_label.py
  9. 15 0
      dictionary_generate/chn_text.txt
  10. 1 0
      dictionary_generate/dict_chn_2000.txt
  11. 1 0
      dictionary_generate/dict_chn_3500.txt
  12. 8763 0
      dictionary_generate/dict_eng_8763.txt
  13. 4320 0
      dictionary_generate/eng_text.txt
  14. 36 0
      dictionary_generate/generate.py
  15. 36 0
      dictionary_generate/test.py
  16. 3 0
      divide_and_convert_to_coco_single_fold/.idea/.gitignore
  17. 8 0
      divide_and_convert_to_coco_single_fold/.idea/3、数据集划分训练集验证集、转coco格式.iml
  18. 26 0
      divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/Project_Default.xml
  19. 6 0
      divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/profiles_settings.xml
  20. 4 0
      divide_and_convert_to_coco_single_fold/.idea/misc.xml
  21. 8 0
      divide_and_convert_to_coco_single_fold/.idea/modules.xml
  22. 160 0
      divide_and_convert_to_coco_single_fold/labelme_to_coco.py
  23. 二进制
      fonts_images_generate/__pycache__/img_tools.cpython-37.pyc
  24. 56 0
      fonts_images_generate/divide_train_eval.py
  25. 二进制
      fonts_images_generate/font/Apple Braille.ttf
  26. 63 0
      fonts_images_generate/generate_imgs_by_fonts.py
  27. 24 0
      fonts_images_generate/get_all_labels.py
  28. 31 0
      fonts_images_generate/gray.py
  29. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0000_ffec5264.jpg
  30. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0001_fff5a1cc.jpg
  31. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0002_fffd428a.jpg
  32. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0003_00053198.jpg
  33. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0004_000d6f00.jpg
  34. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0005_00151406.jpg
  35. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0006_001d0300.jpg
  36. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0007_0024a1ee.jpg
  37. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0008_002c1b36.jpg
  38. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0009_00339894.jpg
  39. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0010_003a7642.jpg
  40. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0011_0041544c.jpg
  41. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0012_0047dfec.jpg
  42. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0013_004e9742.jpg
  43. 二进制
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0014_00555134.jpg
  44. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0000_00b5c174.jpg
  45. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0001_00ba9f66.jpg
  46. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0002_00bf80e8.jpg
  47. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0003_00c54d66.jpg
  48. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0004_00ca7dee.jpg
  49. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0005_00cf8668.jpg
  50. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0006_00d4ba3e.jpg
  51. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0007_00d9c15a.jpg
  52. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0008_00de7eb0.jpg
  53. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0009_00e36088.jpg
  54. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0010_00e7f466.jpg
  55. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0011_00ecd668.jpg
  56. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0012_00f20290.jpg
  57. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0013_00f70c2e.jpg
  58. 二进制
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0014_00fbee28.jpg
  59. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0000_010144c8.jpg
  60. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0001_01064e66.jpg
  61. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0002_010b7e02.jpg
  62. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0003_0110ae14.jpg
  63. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0004_0116054c.jpg
  64. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0005_011b0e52.jpg
  65. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0006_012128dc.jpg
  66. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0007_01265900.jpg
  67. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0008_012b6290.jpg
  68. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0009_01306ba8.jpg
  69. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0010_013574ee.jpg
  70. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0011_013a7dca.jpg
  71. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0012_013fad64.jpg
  72. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0013_01448f62.jpg
  73. 二进制
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0014_0149986c.jpg
  74. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0000_005c79d8.jpg
  75. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0001_00627094.jpg
  76. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0002_006815f4.jpg
  77. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0003_006e7aa8.jpg
  78. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0004_00750e7e.jpg
  79. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0005_007b2598.jpg
  80. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0006_0081910a.jpg
  81. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0007_00875db8.jpg
  82. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0008_008d26c6.jpg
  83. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0009_00931de2.jpg
  84. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0010_0098ea08.jpg
  85. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0011_009edd3a.jpg
  86. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0012_00a4cd40.jpg
  87. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0013_00aac0e6.jpg
  88. 二进制
      fonts_images_generate/images/Songti/kdan_2022-10-25_0014_00b08d5a.jpg
  89. 5 0
      fonts_images_generate/images/label_list.txt
  90. 15 0
      fonts_images_generate/img_paddling.py
  91. 22 0
      fonts_images_generate/img_tools.py
  92. 7560 0
      fonts_images_generate/text/chn_text.txt
  93. 14 0
      fonts_images_generate/text/chn_text_test.txt
  94. 2160 0
      fonts_images_generate/text/eng_text.txt
  95. 15 0
      fonts_images_generate/text/eng_text_test.txt
  96. 114494 0
      get_name_by_spider/addr.txt
  97. 9 0
      get_name_by_spider/get_dict.py
  98. 22 0
      get_name_by_spider/get_name.py
  99. 120520 0
      get_name_by_spider/name.txt
  100. 0 0
      idcardgenerator/OpticalBBold.otf

+ 32 - 0
Images_rename/images_rename.py

@@ -0,0 +1,32 @@
+import argparse
+import datetime
+import os
+import uuid
+import cv2
+
+
+def rename(img_dir, save_dir):
+    filelist = os.listdir(img_dir)  # 获取指定的文件夹包含的文件或文件夹的名字的列表
+    total_num = len(filelist)  # 获取文件夹内所有文件个数
+
+    i = 1  # 图片名字从 0 开始
+    for item in filelist:  # 遍历这个文件夹下的文件,即 图片
+        time = datetime.datetime.now()
+        t = str(time.year).zfill(4)+'-'+str(time.month).zfill(2) + '-'+str(time.day).zfill(2)
+        img = cv2.imread(img_dir + '/' + item)
+        if item.endswith('.jpg') or item.endswith('.png'):
+            # dst = save_dir + '/kdan_' + str(t) + '_' + str(i).zfill(8) + '_' + str(uuid.uuid1())[0:4] + '.jpg'
+            dst = save_dir + '/id_' + item.split('.')[0] + '.jpg'
+            cv2.imwrite(dst, img)
+            i = i + 1
+    print('total %d to rename' % total_num)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default=r'C:\Users\KDAN\Desktop\workspace\rec_data(tw_idcard)\tw_idcard_rec_1223\id')
+    parser.add_argument('--save_dir', type=str, default=r'C:\Users\KDAN\Desktop\workspace\rec_data(tw_idcard)\tw_idcard_rec_1223\id')
+    args = parser.parse_args()
+    if not os.path.exists(args.save_dir):
+        os.makedirs(args.save_dir)
+    rename(args.img_dir, args.save_dir)

+ 1 - 0
TextRecognitionDataGenerator

@@ -0,0 +1 @@
+Subproject commit 173d4572199854943d19dbb5607992331d459c73

+ 56 - 0
classification_dataset_generate/divide_train_eval.py

@@ -0,0 +1,56 @@
+import argparse
+import os
+import random
+
+
+def get_all_img(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    cnt = 0
+    img_path_list = []
+    for img_fold_in in img_fold_list:
+        img_list1 = []
+        if img_fold_in.endswith('.txt'):
+            continue
+        img_list = os.listdir(os.path.join(img_fold, img_fold_in))
+        for img in img_list:
+            if img.endswith('.txt'):
+                continue
+            img_path = str(img_fold_in) + '/' + str(img)
+            img_list1.append(str(img_path) + ' ' + str(cnt))
+        cnt += 1
+        img_path_list.append(img_list1)
+    return img_path_list
+
+
+def divide(lines, img_folds, train_ratio):
+    fp_val = open(str(img_folds) + '/val_list.txt', 'a')
+    fp_train = open(str(img_folds) + '/train_list.txt', 'a')
+    train_size = 0
+    val_size = 0
+    for line in lines:
+        length = len(line)
+        trainList = random.sample(range(0, length), round(train_ratio * length))
+        train_size += len(trainList)
+        for i in trainList:
+            fp_train.write(line[i] + '\n')
+
+        testList = []
+        for i in range(0, length):
+            if i not in trainList:
+                fp_val.write(line[i] + '\n')
+                testList.append(i)
+        val_size += len(testList)
+    print('train images ', train_size)
+    print('val images ', val_size)
+    fp_val.close()
+    fp_train.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default='./font_img_dataset/windows_1/english')
+    parser.add_argument('--train_ratio', type=float, default=0.8)
+    args = parser.parse_args()
+    list1 = get_all_img(args.img_dir)
+    divide(list1, args.img_dir, args.train_ratio)

二进制
correct_imgs_rotation/correted_imgs/16602752667386.png


+ 50 - 0
correct_imgs_rotation/img_correct.py

@@ -0,0 +1,50 @@
+import argparse
+import os
+
+import cv2
+import numpy as np
+from PIL import Image
+import glob
+
+
+def save_img(img_Path, save_Path, cnt):
+    img_path = img_Path
+    img = cv2.imread(img_path)
+    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+    ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
+    coords = np.column_stack(np.where(thresh > 0))
+    angle = cv2.minAreaRect(coords)[-1]
+    print('{}:{}'.format(img_path, angle))
+    # #调整角度
+    if angle < -45:
+        angle = -(90 + angle)
+    else:
+        angle = -angle
+    im = Image.open(img_path)
+    im_rotate = im.rotate(angle, expand=0, fillcolor='#FFFFFF')
+    if cnt == 0:
+        save_path = save_Path + '/' + img_path.split('\\')[-1]
+    else:
+        save_path = save_Path + '/' + img_path.split('/')[-1]
+    im_rotate.save(save_path)
+
+def correct(img_fold, img_Path, save_fold):
+    if not os.path.exists(save_fold):
+        os.makedirs(save_fold)
+    if img_Path == '':
+        path_list = glob.glob(img_fold + '/*.png')
+        size = len(path_list)
+        print('total: {} images'.format(size))
+        for path in path_list:
+            save_img(path, save_fold, 0)
+    else:
+        save_img(img_Path, save_fold, 1)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default='')
+    parser.add_argument('--img_path', type=str, default='')
+    parser.add_argument('--save_dir', type=str, default='')
+    args = parser.parse_args()
+    correct(args.img_dir, args.img_path, args.save_dir)

二进制
correct_imgs_rotation/imgs/16602752667386.png


二进制
correct_imgs_rotation/result/16602752667386.png


+ 41 - 0
delete_anno_by_label/delete_label.py

@@ -0,0 +1,41 @@
+import argparse
+import json
+import os
+
+from tqdm import tqdm
+
+
+def work(fold_path, label):
+    fold_list = os.listdir(fold_path)
+    for fold in fold_list:
+        json_list = os.listdir(fold_path + '/' + fold + '/images')
+        print(fold)
+        for json_path in tqdm(json_list):
+            if json_path.endswith('.json'):
+                file_name = fold_path + '/' + fold + '/images/' + json_path
+                with open(str(file_name), 'r', encoding='utf-8') as fp:
+                    json_data = json.load(fp)
+                    shape_list = []
+                    for shape in json_data['shapes']:
+                        if shape['label'] != label:
+                            shape_list.append(shape)
+                    fp.close()
+                    if len(shape_list) == 0:
+                        # print('remove', file_name)
+                        os.remove(file_name)
+                        os.remove(str(file_name).replace('json', 'jpg'))
+                    else:
+                        os.remove(file_name)
+                        json_data['shapes'] = shape_list
+                        # print(file_name)
+                        with open(str(file_name), 'w', encoding='utf-8') as fp1:
+                            json.dump(json_data, fp1, sort_keys=False, indent=2)
+                            fp1.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--fold', type=str, default='./fold')
+    parser.add_argument('--label', type=str, default='Table')
+    args = parser.parse_args()
+    work(args.fold, args.label)

+ 15 - 0
dictionary_generate/chn_text.txt

@@ -0,0 +1,15 @@
+区方
+跃默进亮
+起沿菱陆博孩
+释痊巴炸抢樊丝贺
+涝茄犊丧个奎题江审睦
+阜巩栖惧宋简支轰按吴免倔
+锨怖陛岿滁霞取陈刀再检匆沿星
+稀大淋油辟启投差陆员所养试而岭良
+疏击徒矾泉压熙山槐锻寿产优贫骤布煤炎
+督滚挑赫激锋尧娥料屉扔酿价够蛊礼曹所席昌
+错椭援戊列木糯带作碳塞接胃林奉武条冲条放幻散
+永横虾藕洼岔升幅脚酿征瘩焦浩内往措川药脸脂洼历集
+慰衍萤胳奔蚁辐晾佃矗台解疼淤俯视饥盆判木然龙碱酣照束
+篙玉滑辐获蜡畸咐馈巧版措欲凹鹏误歧尽杰仓责蝴牧企陆努当测
+道斡窖耗踢面堑烂耐牺划脂先与骆婉希场耿私翟停豪号腐数陶粱咏辰

文件差异内容过多而无法显示
+ 1 - 0
dictionary_generate/dict_chn_2000.txt


文件差异内容过多而无法显示
+ 1 - 0
dictionary_generate/dict_chn_3500.txt


文件差异内容过多而无法显示
+ 8763 - 0
dictionary_generate/dict_eng_8763.txt


文件差异内容过多而无法显示
+ 4320 - 0
dictionary_generate/eng_text.txt


+ 36 - 0
dictionary_generate/generate.py

@@ -0,0 +1,36 @@
+import argparse
+import math
+import numpy as np
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--start', type=int, default=2)
+parser.add_argument('--end', type=int, default=30)
+parser.add_argument('--step', type=int, default=2)
+parser.add_argument('--word_num', type=int, default=10)
+parser.add_argument('--dict_path', type=str, default='chn_dict.txt')
+parser.add_argument('--save_path', type=str, default='chn_text.txt')
+args = parser.parse_args()
+
+words_2000 = ''
+with open('dict_chn_2000.txt', 'r', encoding='utf-8') as fp:
+    words_2000 = fp.readlines()
+    fp.close()
+words2_3500 = ''
+with open('dict_chn_3500.txt', 'r', encoding='utf-8') as fp:
+    words_3500 = fp.readlines()
+    fp.close()
+wp = open(args.save_text, 'w', encoding='utf-8')
+for i in range(args.start, args.end+1, args.step):
+    for j in range(0, args.word_num):
+        x = int(i/3)
+        arr1 = np.random.rand(x)*3500
+        arr2 = np.random.rand(i-x)*2000
+        text = ''
+        for num in arr1:
+            text += words_3500[0][math.floor(num)]
+        for num in arr2:
+            text += words_2000[0][math.floor(num)]
+        wp.write(text+'\n')
+wp.close()
+
+

+ 36 - 0
dictionary_generate/test.py

@@ -0,0 +1,36 @@
+
+import argparse
+import math
+import numpy as np
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--start', type=int, default=1)
+parser.add_argument('--end', type=int, default=12)
+parser.add_argument('--step', type=int, default=1)
+args = parser.parse_args()
+
+words_8763 = ''
+with open('dict_eng_8763.txt', 'r', encoding='utf-8') as fp:
+    words_8763 = fp.readlines()
+    fp.close()
+
+eng_list = []
+for word in words_8763:
+    eng_list.append(word.rstrip('\n'))
+
+wp = open('eng_text.txt', 'w')
+for i in range(args.start, args.end+1, args.step):
+    for j in range(0, 360):
+        arr1 = np.random.rand(i)*len(eng_list)
+        text = ''
+        k = 1
+        for num in arr1:
+            if k == len(arr1):
+                text += eng_list[math.floor(num)] + '\n'
+            else:
+                text += eng_list[math.floor(num)] + ' '
+            k += 1
+        wp.write(text)
+wp.close()
+
+

+ 3 - 0
divide_and_convert_to_coco_single_fold/.idea/.gitignore

@@ -0,0 +1,3 @@
+# Default ignored files
+/shelf/
+/workspace.xml

+ 8 - 0
divide_and_convert_to_coco_single_fold/.idea/3、数据集划分训练集验证集、转coco格式.iml

@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<module type="PYTHON_MODULE" version="4">
+  <component name="NewModuleRootManager">
+    <content url="file://$MODULE_DIR$" />
+    <orderEntry type="inheritedJdk" />
+    <orderEntry type="sourceFolder" forTests="false" />
+  </component>
+</module>

+ 26 - 0
divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/Project_Default.xml

@@ -0,0 +1,26 @@
+<component name="InspectionProjectProfileManager">
+  <profile version="1.0">
+    <option name="myName" value="Project Default" />
+    <inspection_tool class="PyPackageRequirementsInspection" enabled="true" level="WARNING" enabled_by_default="true">
+      <option name="ignoredPackages">
+        <value>
+          <list size="5">
+            <item index="0" class="java.lang.String" itemvalue="visualdlpython" />
+            <item index="1" class="java.lang.String" itemvalue="prettytable" />
+            <item index="2" class="java.lang.String" itemvalue="easydict" />
+            <item index="3" class="java.lang.String" itemvalue="opencv-python" />
+            <item index="4" class="java.lang.String" itemvalue="faiss-cpu" />
+          </list>
+        </value>
+      </option>
+    </inspection_tool>
+    <inspection_tool class="PyPep8NamingInspection" enabled="true" level="WEAK WARNING" enabled_by_default="true">
+      <option name="ignoredErrors">
+        <list>
+          <option value="N803" />
+          <option value="N806" />
+        </list>
+      </option>
+    </inspection_tool>
+  </profile>
+</component>

+ 6 - 0
divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/profiles_settings.xml

@@ -0,0 +1,6 @@
+<component name="InspectionProjectProfileManager">
+  <settings>
+    <option name="USE_PROJECT_PROFILE" value="false" />
+    <version value="1.0" />
+  </settings>
+</component>

+ 4 - 0
divide_and_convert_to_coco_single_fold/.idea/misc.xml

@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectRootManager" version="2" project-jdk-name="Python 3.7" project-jdk-type="Python SDK" />
+</project>

+ 8 - 0
divide_and_convert_to_coco_single_fold/.idea/modules.xml

@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectModuleManager">
+    <modules>
+      <module fileurl="file://$PROJECT_DIR$/.idea/3、数据集划分训练集验证集、转coco格式.iml" filepath="$PROJECT_DIR$/.idea/3、数据集划分训练集验证集、转coco格式.iml" />
+    </modules>
+  </component>
+</project>

+ 160 - 0
divide_and_convert_to_coco_single_fold/labelme_to_coco.py

@@ -0,0 +1,160 @@
+import argparse
+import os
+import json
+import numpy as np
+import glob
+import cv2
+from sklearn.model_selection import train_test_split
+from labelme import utils
+np.random.seed(41)
+
+# 0为背景
+classname_to_id = {
+    "front": 1, # 从1开始标注
+    "back": 2,
+}
+
+
+class Lableme2CoCo:
+
+    def __init__(self):
+        self.images = []
+        self.annotations = []
+        self.categories = []
+        self.img_id = 0
+        self.ann_id = 0
+
+    def save_coco_json(self, instance, save_path):
+        json.dump(instance, open(save_path, 'w', encoding='utf-8'), ensure_ascii=False, indent=1)  # indent=2 更加美观显示
+
+    # 由json文件构建COCO
+    def to_coco(self, json_path_list):
+        self._init_categories()
+        for json_path in json_path_list:
+            print(json_path)
+            obj = self.read_jsonfile(json_path)
+            self.images.append(self._image(obj, json_path))
+            shapes = obj['shapes']
+            for shape in shapes:
+                annotation = self._annotation(shape)
+                self.annotations.append(annotation)
+                self.ann_id += 1
+            self.img_id += 1
+        instance = {}
+        instance['info'] = 'spytensor created'
+        instance['license'] = ['license']
+        instance['images'] = self.images
+        instance['annotations'] = self.annotations
+        instance['categories'] = self.categories
+        return instance
+
+    # 构建类别
+    def _init_categories(self):
+        for k, v in classname_to_id.items():
+            category = {}
+            category['id'] = v
+            category['name'] = k
+            self.categories.append(category)
+
+    # 构建COCO的image字段
+    def _image(self, obj, path):
+        image = {}
+        img_x = utils.img_b64_to_arr(obj['imageData'])
+        h, w = img_x.shape[:-1]
+        image['height'] = h
+        image['width'] = w
+        image['id'] = self.img_id
+        image['file_name'] = os.path.basename(path).replace(".files", ".jpg")
+        return image
+
+    # 构建COCO的annotation字段
+    def _annotation(self, shape):
+        # print('shape', shape)
+        label = shape['label']
+        points = shape['points']
+        annotation = {}
+        annotation['id'] = self.ann_id
+        annotation['image_id'] = self.img_id
+        annotation['category_id'] = int(classname_to_id[label])
+        annotation['segmentation'] = [np.asarray(points).flatten().tolist()]
+        annotation['bbox'] = self._get_box(points)
+        annotation['iscrowd'] = 0
+        annotation['area'] = 1.0
+        return annotation
+
+    # 读取json文件,返回一个json对象
+    def read_jsonfile(self, path):
+        with open(path, "r", encoding='utf-8') as f:
+            return json.loads(f.read())
+
+    # COCO的格式: [x1,y1,w,h] 对应COCO的bbox格式
+    def _get_box(self, points):
+        min_x = min_y = np.inf
+        max_x = max_y = 0
+        for x, y in points:
+            min_x = min(min_x, x)
+            min_y = min(min_y, y)
+            max_x = max(max_x, x)
+            max_y = max(max_y, y)
+        return [min_x, min_y, max_x - min_x, max_y - min_y]
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--labelme_dir', type=str, default='./images')
+    parser.add_argument('--save_coco_dir', type=str, default='./coco_dataset')
+    args = parser.parse_args()
+    labelme_path = args.labelme_dir
+    saved_coco_path = args.save_coco_dir
+    print('reading...')
+    # 创建文件
+    if not os.path.exists("%scoco/train/" % saved_coco_path):
+        os.makedirs("%scoco/train/" % saved_coco_path)
+    if not os.path.exists("%scoco/train/images/" % saved_coco_path):
+        os.makedirs("%scoco/train/images/" % saved_coco_path)
+    if not os.path.exists("%scoco/eval/" % saved_coco_path):
+        os.makedirs("%scoco/eval/" % saved_coco_path)     
+    if not os.path.exists("%scoco/eval/images/" % saved_coco_path):
+        os.makedirs("%scoco/eval/images/" % saved_coco_path)
+    # 获取images目录下所有的joson文件列表
+    print(labelme_path + "/*.files")
+    json_list_path = glob.glob(labelme_path + "/*.files")
+    print('json_list_path: ', len(json_list_path))
+    # 数据划分,这里没有区分val2017和tran2017目录,所有图片都放在images目录下
+    train_path, val_path = train_test_split(json_list_path, test_size=0.1, train_size=0.9)
+    print("train_n:", len(train_path), 'val_n:', len(val_path))
+
+    # 把训练集转化为COCO的json格式
+    l2c_train = Lableme2CoCo()
+    train_instance = l2c_train.to_coco(train_path)
+    l2c_train.save_coco_json(train_instance, '%scoco/train/annotations.files' % saved_coco_path)
+    for file in train_path:
+        # shutil.copy(file.replace("files", "jpg"), "%scoco/images/train2017/" % saved_coco_path)
+        img_name = file.replace('files', 'jpg')
+        temp_img = cv2.imread(img_name)
+        try:
+            img_name = str(img_name).split('\\')[-1]
+            cv2.imwrite("{}coco/train/images/{}".format(saved_coco_path, img_name.replace('jpg', 'jpg')),temp_img)
+        except Exception as e:
+            print(e)
+            print('Wrong Image:', img_name)
+            continue
+        print(img_name + '-->', img_name.replace('jpg', 'jpg'))
+
+    for file in val_path:
+        # shutil.copy(file.replace("files", "jpg"), "%scoco/images/val2017/" % saved_coco_path)
+        img_name = file.replace('files', 'jpg')
+        temp_img = cv2.imread(img_name)
+        try:
+            img_name = str(img_name).split('\\')[-1]
+            cv2.imwrite("{}coco/eval/images/{}".format(saved_coco_path, img_name.replace('jpg', 'jpg')), temp_img)
+        except Exception as e:
+            print(e)
+            print('Wrong Image:', img_name)
+            continue
+        print(img_name + '-->', img_name.replace('jpg', 'jpg'))
+
+    # 把验证集转化为COCO的json格式
+    l2c_val = Lableme2CoCo()
+    val_instance = l2c_val.to_coco(val_path)
+    l2c_val.save_coco_json(val_instance, '%scoco/eval/annotations.files' % saved_coco_path)

二进制
fonts_images_generate/__pycache__/img_tools.cpython-37.pyc


+ 56 - 0
fonts_images_generate/divide_train_eval.py

@@ -0,0 +1,56 @@
+import argparse
+import os
+import random
+
+
+def get_all_img(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    cnt = 0
+    img_path_list = []
+    for img_fold_in in img_fold_list:
+        img_list1 = []
+        if img_fold_in.endswith('.txt'):
+            continue
+        img_list = os.listdir(os.path.join(img_fold, img_fold_in))
+        for img in img_list:
+            if img.endswith('.txt'):
+                continue
+            img_path = str(img_fold_in) + '/' + str(img)
+            img_list1.append(str(img_path) + ' ' + str(cnt))
+        cnt += 1
+        img_path_list.append(img_list1)
+    return img_path_list
+
+
+def divide(lines, img_folds, train_ratio):
+    fp_val = open(str(img_folds) + '/val_list.txt', 'a')
+    fp_train = open(str(img_folds) + '/train_list.txt', 'a')
+    train_size = 0
+    val_size = 0
+    for line in lines:
+        length = len(line)
+        trainList = random.sample(range(0, length), round(train_ratio * length))
+        train_size += len(trainList)
+        for i in trainList:
+            fp_train.write(line[i] + '\n')
+
+        testList = []
+        for i in range(0, length):
+            if i not in trainList:
+                fp_val.write(line[i] + '\n')
+                testList.append(i)
+        val_size += len(testList)
+    print('train images ', train_size)
+    print('val images ', val_size)
+    fp_val.close()
+    fp_train.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default='./font_img_dataset/windows_1/chinese_gray')
+    parser.add_argument('--train_ratio', type=float, default=0.8)
+    args = parser.parse_args()
+    list1 = get_all_img(args.img_dir)
+    divide(list1, args.img_dir, args.train_ratio)

二进制
fonts_images_generate/font/Apple Braille.ttf


+ 63 - 0
fonts_images_generate/generate_imgs_by_fonts.py

@@ -0,0 +1,63 @@
+import argparse
+import datetime
+import os
+import uuid
+from PIL import Image, ImageFont, ImageDraw
+from tqdm import tqdm
+
+
+def create_img(font_path, text, save_path, cnt):
+    fontSize = 25
+    # 字体样式,
+    font = ImageFont.truetype(font_path, fontSize, encoding="gbk")
+    im = Image.new("RGB", (int(font.getbbox(text.rstrip('\n'))[2]), font.getbbox(text)[3]), (255, 255, 255))
+    dr = ImageDraw.Draw(im)
+    # 文字颜色
+    dr.text((0, 0), text, font=font, fill="#000000")
+    time = datetime.datetime.now()
+    t = str(time.year).zfill(4) + '-' + str(time.month).zfill(2) + '-' + str(time.day).zfill(2)
+    if not os.path.exists("%s/%s/" % (save_path, font_path.split('/')[-1][0:-4])):
+        os.makedirs("%s/%s/" % (save_path, font_path.split('/')[-1][0:-4]))
+    # 图片命名
+    img_name = 'kdan_' + t + '_' + str(cnt).zfill(4) + '_' + str(uuid.uuid1())[0:8] + '.jpg'
+    save_path = "%s/%s/" % (save_path, font_path.split('/')[-1][0:-4]) + img_name
+    # 保存
+    im.save(save_path)
+
+
+def get_all_labels(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    # label_list.txt文件
+    fp_label = open(str(img_fold) + '/label_list.txt', 'w')
+    cnt = 0
+    for img_fold_in in img_fold_list:
+        if img_fold_in.endswith('.txt'):
+            continue
+        # 字体名以及对应id写入label_list.txt
+        fp_label.write(str(cnt) + ' ' + str(img_fold_in) + '\n')
+        cnt += 1
+    fp_label.close()
+
+
+def work(font_path, text_path, save_path):
+    fontPath_list = os.listdir(font_path)
+    print('generate images for fonts based on %s' % text_path)
+    for fontPath in tqdm(fontPath_list):
+        with open(text_path, 'r') as fpp:
+            t = fpp.readlines()
+        fpp.close()
+        cnt = 0
+        for text in t:
+            create_img(str(font_path) + '/' + str(fontPath), text, save_path, cnt)
+            cnt += 1
+    get_all_labels(save_path)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--font_dir', type=str, default='./font')
+    parser.add_argument('--text_path', type=str, default='./text/eng_text_test.txt')
+    parser.add_argument('--save_dir', type=str, default='./images')
+    args = parser.parse_args()
+    work(args.font_dir, args.text_path, args.save_dir)

+ 24 - 0
fonts_images_generate/get_all_labels.py

@@ -0,0 +1,24 @@
+import argparse
+import os
+
+
+def get_all_labels(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    # label_list.txt文件
+    fp_label = open(str(img_fold) + '/label_list.txt', 'w')
+    cnt = 0
+    for img_fold_in in img_fold_list:
+        if img_fold_in.endswith('.txt'):
+            continue
+        # 字体名以及对应id写入label_list.txt
+        fp_label.write(str(cnt) + ' ' + str(img_fold_in) + '\n')
+        cnt += 1
+    fp_label.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--save_dir', type=str, default=r'C:\Users\KDAN\Desktop\workspace\字体分类数据集\windows_1\font_test_dataset')
+    args = parser.parse_args()
+    get_all_labels(args.save_dir)

+ 31 - 0
fonts_images_generate/gray.py

@@ -0,0 +1,31 @@
+import cv2
+import os
+import glob
+# 根目录路径
+root_path = r'C:\Users\KDAN\Desktop\work\exchange\font_style_classification\Fonts_img_Dataset\windows_1\chinese'
+save_path = r'C:\Users\KDAN\Desktop\work\exchange\font_style_classification\Fonts_img_Dataset\windows_1\chinese_gray'
+
+list_path = os.listdir(root_path)
+
+for k, names in enumerate(list_path):
+    print(k, names)
+    # if names.endswith('.txt'):
+    #     continue
+    # if not os.path.exists(save_path + '\\' + names):
+    #     os.makedirs(save_path + '\\' + names)
+    bmp_path = glob.glob(r'C:\Users\KDAN\Desktop\work\exchange\font_style_classification\Fonts_img_Dataset\windows_1\chinese\{}\*jpg'.format(names))
+    for i, path_bmp in enumerate(bmp_path):
+        if path_bmp.endswith('.txt'):
+            continue
+        # print(path_bmp)
+        a = os.path.split(path_bmp)
+        name = os.path.basename(a[0])
+        # print(name)
+        img = cv2.imread(path_bmp)
+        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+        # cv2.imshow('figure', gray)
+        # cv2.waitKey(0)
+        cv2.imwrite('C:/Users/KDAN/Desktop/work/exchange/font_style_classification/Fonts_img_Dataset/windows_1/chinese_gray/{}/{}_{}.jpg'.format(names, name, i), gray)
+
+
+

二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0000_ffec5264.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0001_fff5a1cc.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0002_fffd428a.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0003_00053198.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0004_000d6f00.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0005_00151406.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0006_001d0300.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0007_0024a1ee.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0008_002c1b36.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0009_00339894.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0010_003a7642.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0011_0041544c.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0012_0047dfec.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0013_004e9742.jpg


二进制
fonts_images_generate/images/PingFang/kdan_2022-10-25_0014_00555134.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0000_00b5c174.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0001_00ba9f66.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0002_00bf80e8.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0003_00c54d66.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0004_00ca7dee.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0005_00cf8668.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0006_00d4ba3e.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0007_00d9c15a.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0008_00de7eb0.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0009_00e36088.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0010_00e7f466.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0011_00ecd668.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0012_00f20290.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0013_00f70c2e.jpg


二进制
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0014_00fbee28.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0000_010144c8.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0001_01064e66.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0002_010b7e02.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0003_0110ae14.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0004_0116054c.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0005_011b0e52.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0006_012128dc.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0007_01265900.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0008_012b6290.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0009_01306ba8.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0010_013574ee.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0011_013a7dca.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0012_013fad64.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0013_01448f62.jpg


二进制
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0014_0149986c.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0000_005c79d8.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0001_00627094.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0002_006815f4.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0003_006e7aa8.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0004_00750e7e.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0005_007b2598.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0006_0081910a.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0007_00875db8.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0008_008d26c6.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0009_00931de2.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0010_0098ea08.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0011_009edd3a.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0012_00a4cd40.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0013_00aac0e6.jpg


二进制
fonts_images_generate/images/Songti/kdan_2022-10-25_0014_00b08d5a.jpg


+ 5 - 0
fonts_images_generate/images/label_list.txt

@@ -0,0 +1,5 @@
+0 Apple Braille
+1 PingFang
+2 Songti
+3 STHeiti Light
+4 STHeiti Medium

+ 15 - 0
fonts_images_generate/img_paddling.py

@@ -0,0 +1,15 @@
+# import os
+import img_tools
+# from tqdm import tqdm
+#
+#
+# img_folds = os.listdir('./font_img_dataset/windows_1/english')
+# for img_fold in tqdm(img_folds):
+#     img_fold = str('font_img_dataset/windows_1/english') + '/' + str(img_fold)
+#     if not img_fold.endswith('.txt'):
+#         img_path_list = os.listdir(img_fold)
+#         for img_path in img_path_list:
+#             img_tools.img_resize(img_fold + '/' + str(img_path))
+
+path = r"C:\Users\KDAN\Desktop\images\字体\153409.jpg"
+img_tools.img_resize(path)

+ 22 - 0
fonts_images_generate/img_tools.py

@@ -0,0 +1,22 @@
+import cv2
+
+
+def img_resize(img_path):
+    img = cv2.imread(img_path)
+    width = img.shape[1]
+    height = img.shape[0]
+    if height != 48:
+        ratio = 35 / height
+        height = 35
+        width = int(width * ratio)
+        dim = (width, height)
+        resized = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
+        width = resized.shape[1]
+        height = resized.shape[0]
+        if width < 960:
+            left = int((960 - width) / 2)
+            top = int((48 - height) / 2)
+            ret = cv2.copyMakeBorder(resized, top, 48 - height - top, left, 960 - width - left, cv2.BORDER_CONSTANT,
+                                     value=(255, 255, 255))
+            cv2.imwrite(img_path, ret)
+

文件差异内容过多而无法显示
+ 7560 - 0
fonts_images_generate/text/chn_text.txt


+ 14 - 0
fonts_images_generate/text/chn_text_test.txt

@@ -0,0 +1,14 @@
+本节内容可能不会
+很长,但是
+还是希望尽可能把这个环节重要的骨架勾勒出来。
+有一
+个经典的问题是:“如果你是一个投资人,要投资
+一个项目,核心是看什么?项目还是团队?”。与之对应的一个问题
+是:“如果你是一位创业者,创业的
+基石是一个独特的项目还是一个优秀的团队?”
+当然这种二选一的问题往往都只强调了某一个方面,并
+没有标准答案,有的人会选项目,有的人会选人(团队)。本节要讨论的
+是在一个企业里面“如何构建一个有效的能做产品的团队所需要的不同角色的人”,这
+个问题。
+顺带的,以我当前所在的工作环境,这个问题天然地穿插了一个“远程
+工作”的上下文,它不是主要的问题,但是在里面是一个因素。

文件差异内容过多而无法显示
+ 2160 - 0
fonts_images_generate/text/eng_text.txt


+ 15 - 0
fonts_images_generate/text/eng_text_test.txt

@@ -0,0 +1,15 @@
+Feature pyramids are a basic component in recognition
+systems for detecting objects at different scales.
+But recent
+deep learning object detectors have avoided pyramid representations,
+in part because they are compute and memory
+intensive. In this paper, we exploit the inherent multi-scale,
+pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost.
+A topdown architecture with lateral connections is developed
+for
+building high-level semantic feature maps at all scales. T
+his
+architecture, called a Feature Pyramid Network (FPN),
+shows significant
+improvement as a generic
+feature extractor in several applications.

文件差异内容过多而无法显示
+ 114494 - 0
get_name_by_spider/addr.txt


+ 9 - 0
get_name_by_spider/get_dict.py

@@ -0,0 +1,9 @@
+fp_name = open('name.txt', 'r', encoding='utf-8')
+str = ''
+for line in fp_name.readlines():
+    str += line.rstrip('\n')
+fp_name.close()
+print(list(str))
+print(len(list(str)))
+print(sorted(set(list(str))))
+print(len(set(list(str))))

+ 22 - 0
get_name_by_spider/get_name.py

@@ -0,0 +1,22 @@
+import requests  # 导包
+import re
+from tqdm import tqdm
+
+pattern1 = r'<tr><td>(.*?)</td><td>'
+pattern2 = r'[0-9]</td><td>(.*?)</td></tr><tr><td>'
+n = 5000
+fp_name = open('name.txt', 'a+', encoding='utf-8')
+fp_addr = open('addr.txt', 'a+', encoding='utf-8')
+for i in tqdm(range(0, n)):
+    url = 'https://www.myfakeinfo.com/nationalidno/get-chinataiwan-ic-numberandname.php'
+    header = {
+        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/"
+                      "537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"}
+    response = requests.get(url, headers=header)  # 模拟 get 请求
+    response.encoding = 'utf-8'  # 指定编码
+    name_list = re.findall(pattern1, response.text)
+    for name in name_list:
+        fp_name.write(name + '\n')
+    addr_list = re.findall(pattern2, response.text)
+    for addr in addr_list:
+        fp_addr.write(addr[11:] + '\n')

文件差异内容过多而无法显示
+ 120520 - 0
get_name_by_spider/name.txt


+ 0 - 0
idcardgenerator/OpticalBBold.otf


部分文件因为文件数量过多而无法显示