Quellcode durchsuchen

第一次上传

kangtan vor 1 Jahr
Ursprung
Commit
3b7aa5e608
100 geänderte Dateien mit 258577 neuen und 0 gelöschten Zeilen
  1. 32 0
      Images_rename/images_rename.py
  2. 1 0
      TextRecognitionDataGenerator
  3. 56 0
      classification_dataset_generate/divide_train_eval.py
  4. BIN
      correct_imgs_rotation/correted_imgs/16602752667386.png
  5. 50 0
      correct_imgs_rotation/img_correct.py
  6. BIN
      correct_imgs_rotation/imgs/16602752667386.png
  7. BIN
      correct_imgs_rotation/result/16602752667386.png
  8. 41 0
      delete_anno_by_label/delete_label.py
  9. 15 0
      dictionary_generate/chn_text.txt
  10. 1 0
      dictionary_generate/dict_chn_2000.txt
  11. 1 0
      dictionary_generate/dict_chn_3500.txt
  12. 8763 0
      dictionary_generate/dict_eng_8763.txt
  13. 4320 0
      dictionary_generate/eng_text.txt
  14. 36 0
      dictionary_generate/generate.py
  15. 36 0
      dictionary_generate/test.py
  16. 3 0
      divide_and_convert_to_coco_single_fold/.idea/.gitignore
  17. 8 0
      divide_and_convert_to_coco_single_fold/.idea/3、数据集划分训练集验证集、转coco格式.iml
  18. 26 0
      divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/Project_Default.xml
  19. 6 0
      divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/profiles_settings.xml
  20. 4 0
      divide_and_convert_to_coco_single_fold/.idea/misc.xml
  21. 8 0
      divide_and_convert_to_coco_single_fold/.idea/modules.xml
  22. 160 0
      divide_and_convert_to_coco_single_fold/labelme_to_coco.py
  23. BIN
      fonts_images_generate/__pycache__/img_tools.cpython-37.pyc
  24. 56 0
      fonts_images_generate/divide_train_eval.py
  25. BIN
      fonts_images_generate/font/Apple Braille.ttf
  26. 63 0
      fonts_images_generate/generate_imgs_by_fonts.py
  27. 24 0
      fonts_images_generate/get_all_labels.py
  28. 31 0
      fonts_images_generate/gray.py
  29. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0000_ffec5264.jpg
  30. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0001_fff5a1cc.jpg
  31. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0002_fffd428a.jpg
  32. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0003_00053198.jpg
  33. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0004_000d6f00.jpg
  34. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0005_00151406.jpg
  35. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0006_001d0300.jpg
  36. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0007_0024a1ee.jpg
  37. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0008_002c1b36.jpg
  38. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0009_00339894.jpg
  39. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0010_003a7642.jpg
  40. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0011_0041544c.jpg
  41. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0012_0047dfec.jpg
  42. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0013_004e9742.jpg
  43. BIN
      fonts_images_generate/images/PingFang/kdan_2022-10-25_0014_00555134.jpg
  44. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0000_00b5c174.jpg
  45. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0001_00ba9f66.jpg
  46. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0002_00bf80e8.jpg
  47. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0003_00c54d66.jpg
  48. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0004_00ca7dee.jpg
  49. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0005_00cf8668.jpg
  50. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0006_00d4ba3e.jpg
  51. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0007_00d9c15a.jpg
  52. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0008_00de7eb0.jpg
  53. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0009_00e36088.jpg
  54. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0010_00e7f466.jpg
  55. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0011_00ecd668.jpg
  56. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0012_00f20290.jpg
  57. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0013_00f70c2e.jpg
  58. BIN
      fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0014_00fbee28.jpg
  59. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0000_010144c8.jpg
  60. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0001_01064e66.jpg
  61. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0002_010b7e02.jpg
  62. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0003_0110ae14.jpg
  63. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0004_0116054c.jpg
  64. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0005_011b0e52.jpg
  65. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0006_012128dc.jpg
  66. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0007_01265900.jpg
  67. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0008_012b6290.jpg
  68. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0009_01306ba8.jpg
  69. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0010_013574ee.jpg
  70. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0011_013a7dca.jpg
  71. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0012_013fad64.jpg
  72. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0013_01448f62.jpg
  73. BIN
      fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0014_0149986c.jpg
  74. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0000_005c79d8.jpg
  75. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0001_00627094.jpg
  76. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0002_006815f4.jpg
  77. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0003_006e7aa8.jpg
  78. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0004_00750e7e.jpg
  79. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0005_007b2598.jpg
  80. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0006_0081910a.jpg
  81. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0007_00875db8.jpg
  82. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0008_008d26c6.jpg
  83. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0009_00931de2.jpg
  84. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0010_0098ea08.jpg
  85. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0011_009edd3a.jpg
  86. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0012_00a4cd40.jpg
  87. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0013_00aac0e6.jpg
  88. BIN
      fonts_images_generate/images/Songti/kdan_2022-10-25_0014_00b08d5a.jpg
  89. 5 0
      fonts_images_generate/images/label_list.txt
  90. 15 0
      fonts_images_generate/img_paddling.py
  91. 22 0
      fonts_images_generate/img_tools.py
  92. 7560 0
      fonts_images_generate/text/chn_text.txt
  93. 14 0
      fonts_images_generate/text/chn_text_test.txt
  94. 2160 0
      fonts_images_generate/text/eng_text.txt
  95. 15 0
      fonts_images_generate/text/eng_text_test.txt
  96. 114494 0
      get_name_by_spider/addr.txt
  97. 9 0
      get_name_by_spider/get_dict.py
  98. 22 0
      get_name_by_spider/get_name.py
  99. 120520 0
      get_name_by_spider/name.txt
  100. 0 0
      idcardgenerator/OpticalBBold.otf

+ 32 - 0
Images_rename/images_rename.py

@@ -0,0 +1,32 @@
+import argparse
+import datetime
+import os
+import uuid
+import cv2
+
+
+def rename(img_dir, save_dir):
+    filelist = os.listdir(img_dir)  # 获取指定的文件夹包含的文件或文件夹的名字的列表
+    total_num = len(filelist)  # 获取文件夹内所有文件个数
+
+    i = 1  # 图片名字从 0 开始
+    for item in filelist:  # 遍历这个文件夹下的文件,即 图片
+        time = datetime.datetime.now()
+        t = str(time.year).zfill(4)+'-'+str(time.month).zfill(2) + '-'+str(time.day).zfill(2)
+        img = cv2.imread(img_dir + '/' + item)
+        if item.endswith('.jpg') or item.endswith('.png'):
+            # dst = save_dir + '/kdan_' + str(t) + '_' + str(i).zfill(8) + '_' + str(uuid.uuid1())[0:4] + '.jpg'
+            dst = save_dir + '/id_' + item.split('.')[0] + '.jpg'
+            cv2.imwrite(dst, img)
+            i = i + 1
+    print('total %d to rename' % total_num)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default=r'C:\Users\KDAN\Desktop\workspace\rec_data(tw_idcard)\tw_idcard_rec_1223\id')
+    parser.add_argument('--save_dir', type=str, default=r'C:\Users\KDAN\Desktop\workspace\rec_data(tw_idcard)\tw_idcard_rec_1223\id')
+    args = parser.parse_args()
+    if not os.path.exists(args.save_dir):
+        os.makedirs(args.save_dir)
+    rename(args.img_dir, args.save_dir)

+ 1 - 0
TextRecognitionDataGenerator

@@ -0,0 +1 @@
+Subproject commit 173d4572199854943d19dbb5607992331d459c73

+ 56 - 0
classification_dataset_generate/divide_train_eval.py

@@ -0,0 +1,56 @@
+import argparse
+import os
+import random
+
+
+def get_all_img(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    cnt = 0
+    img_path_list = []
+    for img_fold_in in img_fold_list:
+        img_list1 = []
+        if img_fold_in.endswith('.txt'):
+            continue
+        img_list = os.listdir(os.path.join(img_fold, img_fold_in))
+        for img in img_list:
+            if img.endswith('.txt'):
+                continue
+            img_path = str(img_fold_in) + '/' + str(img)
+            img_list1.append(str(img_path) + ' ' + str(cnt))
+        cnt += 1
+        img_path_list.append(img_list1)
+    return img_path_list
+
+
+def divide(lines, img_folds, train_ratio):
+    fp_val = open(str(img_folds) + '/val_list.txt', 'a')
+    fp_train = open(str(img_folds) + '/train_list.txt', 'a')
+    train_size = 0
+    val_size = 0
+    for line in lines:
+        length = len(line)
+        trainList = random.sample(range(0, length), round(train_ratio * length))
+        train_size += len(trainList)
+        for i in trainList:
+            fp_train.write(line[i] + '\n')
+
+        testList = []
+        for i in range(0, length):
+            if i not in trainList:
+                fp_val.write(line[i] + '\n')
+                testList.append(i)
+        val_size += len(testList)
+    print('train images ', train_size)
+    print('val images ', val_size)
+    fp_val.close()
+    fp_train.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default='./font_img_dataset/windows_1/english')
+    parser.add_argument('--train_ratio', type=float, default=0.8)
+    args = parser.parse_args()
+    list1 = get_all_img(args.img_dir)
+    divide(list1, args.img_dir, args.train_ratio)

BIN
correct_imgs_rotation/correted_imgs/16602752667386.png


+ 50 - 0
correct_imgs_rotation/img_correct.py

@@ -0,0 +1,50 @@
+import argparse
+import os
+
+import cv2
+import numpy as np
+from PIL import Image
+import glob
+
+
+def save_img(img_Path, save_Path, cnt):
+    img_path = img_Path
+    img = cv2.imread(img_path)
+    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+    ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
+    coords = np.column_stack(np.where(thresh > 0))
+    angle = cv2.minAreaRect(coords)[-1]
+    print('{}:{}'.format(img_path, angle))
+    # #调整角度
+    if angle < -45:
+        angle = -(90 + angle)
+    else:
+        angle = -angle
+    im = Image.open(img_path)
+    im_rotate = im.rotate(angle, expand=0, fillcolor='#FFFFFF')
+    if cnt == 0:
+        save_path = save_Path + '/' + img_path.split('\\')[-1]
+    else:
+        save_path = save_Path + '/' + img_path.split('/')[-1]
+    im_rotate.save(save_path)
+
+def correct(img_fold, img_Path, save_fold):
+    if not os.path.exists(save_fold):
+        os.makedirs(save_fold)
+    if img_Path == '':
+        path_list = glob.glob(img_fold + '/*.png')
+        size = len(path_list)
+        print('total: {} images'.format(size))
+        for path in path_list:
+            save_img(path, save_fold, 0)
+    else:
+        save_img(img_Path, save_fold, 1)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default='')
+    parser.add_argument('--img_path', type=str, default='')
+    parser.add_argument('--save_dir', type=str, default='')
+    args = parser.parse_args()
+    correct(args.img_dir, args.img_path, args.save_dir)

BIN
correct_imgs_rotation/imgs/16602752667386.png


BIN
correct_imgs_rotation/result/16602752667386.png


+ 41 - 0
delete_anno_by_label/delete_label.py

@@ -0,0 +1,41 @@
+import argparse
+import json
+import os
+
+from tqdm import tqdm
+
+
+def work(fold_path, label):
+    fold_list = os.listdir(fold_path)
+    for fold in fold_list:
+        json_list = os.listdir(fold_path + '/' + fold + '/images')
+        print(fold)
+        for json_path in tqdm(json_list):
+            if json_path.endswith('.json'):
+                file_name = fold_path + '/' + fold + '/images/' + json_path
+                with open(str(file_name), 'r', encoding='utf-8') as fp:
+                    json_data = json.load(fp)
+                    shape_list = []
+                    for shape in json_data['shapes']:
+                        if shape['label'] != label:
+                            shape_list.append(shape)
+                    fp.close()
+                    if len(shape_list) == 0:
+                        # print('remove', file_name)
+                        os.remove(file_name)
+                        os.remove(str(file_name).replace('json', 'jpg'))
+                    else:
+                        os.remove(file_name)
+                        json_data['shapes'] = shape_list
+                        # print(file_name)
+                        with open(str(file_name), 'w', encoding='utf-8') as fp1:
+                            json.dump(json_data, fp1, sort_keys=False, indent=2)
+                            fp1.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--fold', type=str, default='./fold')
+    parser.add_argument('--label', type=str, default='Table')
+    args = parser.parse_args()
+    work(args.fold, args.label)

+ 15 - 0
dictionary_generate/chn_text.txt

@@ -0,0 +1,15 @@
+区方
+跃默进亮
+起沿菱陆博孩
+释痊巴炸抢樊丝贺
+涝茄犊丧个奎题江审睦
+阜巩栖惧宋简支轰按吴免倔
+锨怖陛岿滁霞取陈刀再检匆沿星
+稀大淋油辟启投差陆员所养试而岭良
+疏击徒矾泉压熙山槐锻寿产优贫骤布煤炎
+督滚挑赫激锋尧娥料屉扔酿价够蛊礼曹所席昌
+错椭援戊列木糯带作碳塞接胃林奉武条冲条放幻散
+永横虾藕洼岔升幅脚酿征瘩焦浩内往措川药脸脂洼历集
+慰衍萤胳奔蚁辐晾佃矗台解疼淤俯视饥盆判木然龙碱酣照束
+篙玉滑辐获蜡畸咐馈巧版措欲凹鹏误歧尽杰仓责蝴牧企陆努当测
+道斡窖耗踢面堑烂耐牺划脂先与骆婉希场耿私翟停豪号腐数陶粱咏辰

Datei-Diff unterdrückt, da er zu groß ist
+ 1 - 0
dictionary_generate/dict_chn_2000.txt


Datei-Diff unterdrückt, da er zu groß ist
+ 1 - 0
dictionary_generate/dict_chn_3500.txt


Datei-Diff unterdrückt, da er zu groß ist
+ 8763 - 0
dictionary_generate/dict_eng_8763.txt


Datei-Diff unterdrückt, da er zu groß ist
+ 4320 - 0
dictionary_generate/eng_text.txt


+ 36 - 0
dictionary_generate/generate.py

@@ -0,0 +1,36 @@
+import argparse
+import math
+import numpy as np
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--start', type=int, default=2)
+parser.add_argument('--end', type=int, default=30)
+parser.add_argument('--step', type=int, default=2)
+parser.add_argument('--word_num', type=int, default=10)
+parser.add_argument('--dict_path', type=str, default='chn_dict.txt')
+parser.add_argument('--save_path', type=str, default='chn_text.txt')
+args = parser.parse_args()
+
+words_2000 = ''
+with open('dict_chn_2000.txt', 'r', encoding='utf-8') as fp:
+    words_2000 = fp.readlines()
+    fp.close()
+words2_3500 = ''
+with open('dict_chn_3500.txt', 'r', encoding='utf-8') as fp:
+    words_3500 = fp.readlines()
+    fp.close()
+wp = open(args.save_text, 'w', encoding='utf-8')
+for i in range(args.start, args.end+1, args.step):
+    for j in range(0, args.word_num):
+        x = int(i/3)
+        arr1 = np.random.rand(x)*3500
+        arr2 = np.random.rand(i-x)*2000
+        text = ''
+        for num in arr1:
+            text += words_3500[0][math.floor(num)]
+        for num in arr2:
+            text += words_2000[0][math.floor(num)]
+        wp.write(text+'\n')
+wp.close()
+
+

+ 36 - 0
dictionary_generate/test.py

@@ -0,0 +1,36 @@
+
+import argparse
+import math
+import numpy as np
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--start', type=int, default=1)
+parser.add_argument('--end', type=int, default=12)
+parser.add_argument('--step', type=int, default=1)
+args = parser.parse_args()
+
+words_8763 = ''
+with open('dict_eng_8763.txt', 'r', encoding='utf-8') as fp:
+    words_8763 = fp.readlines()
+    fp.close()
+
+eng_list = []
+for word in words_8763:
+    eng_list.append(word.rstrip('\n'))
+
+wp = open('eng_text.txt', 'w')
+for i in range(args.start, args.end+1, args.step):
+    for j in range(0, 360):
+        arr1 = np.random.rand(i)*len(eng_list)
+        text = ''
+        k = 1
+        for num in arr1:
+            if k == len(arr1):
+                text += eng_list[math.floor(num)] + '\n'
+            else:
+                text += eng_list[math.floor(num)] + ' '
+            k += 1
+        wp.write(text)
+wp.close()
+
+

+ 3 - 0
divide_and_convert_to_coco_single_fold/.idea/.gitignore

@@ -0,0 +1,3 @@
+# Default ignored files
+/shelf/
+/workspace.xml

+ 8 - 0
divide_and_convert_to_coco_single_fold/.idea/3、数据集划分训练集验证集、转coco格式.iml

@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<module type="PYTHON_MODULE" version="4">
+  <component name="NewModuleRootManager">
+    <content url="file://$MODULE_DIR$" />
+    <orderEntry type="inheritedJdk" />
+    <orderEntry type="sourceFolder" forTests="false" />
+  </component>
+</module>

+ 26 - 0
divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/Project_Default.xml

@@ -0,0 +1,26 @@
+<component name="InspectionProjectProfileManager">
+  <profile version="1.0">
+    <option name="myName" value="Project Default" />
+    <inspection_tool class="PyPackageRequirementsInspection" enabled="true" level="WARNING" enabled_by_default="true">
+      <option name="ignoredPackages">
+        <value>
+          <list size="5">
+            <item index="0" class="java.lang.String" itemvalue="visualdlpython" />
+            <item index="1" class="java.lang.String" itemvalue="prettytable" />
+            <item index="2" class="java.lang.String" itemvalue="easydict" />
+            <item index="3" class="java.lang.String" itemvalue="opencv-python" />
+            <item index="4" class="java.lang.String" itemvalue="faiss-cpu" />
+          </list>
+        </value>
+      </option>
+    </inspection_tool>
+    <inspection_tool class="PyPep8NamingInspection" enabled="true" level="WEAK WARNING" enabled_by_default="true">
+      <option name="ignoredErrors">
+        <list>
+          <option value="N803" />
+          <option value="N806" />
+        </list>
+      </option>
+    </inspection_tool>
+  </profile>
+</component>

+ 6 - 0
divide_and_convert_to_coco_single_fold/.idea/inspectionProfiles/profiles_settings.xml

@@ -0,0 +1,6 @@
+<component name="InspectionProjectProfileManager">
+  <settings>
+    <option name="USE_PROJECT_PROFILE" value="false" />
+    <version value="1.0" />
+  </settings>
+</component>

+ 4 - 0
divide_and_convert_to_coco_single_fold/.idea/misc.xml

@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectRootManager" version="2" project-jdk-name="Python 3.7" project-jdk-type="Python SDK" />
+</project>

+ 8 - 0
divide_and_convert_to_coco_single_fold/.idea/modules.xml

@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectModuleManager">
+    <modules>
+      <module fileurl="file://$PROJECT_DIR$/.idea/3、数据集划分训练集验证集、转coco格式.iml" filepath="$PROJECT_DIR$/.idea/3、数据集划分训练集验证集、转coco格式.iml" />
+    </modules>
+  </component>
+</project>

+ 160 - 0
divide_and_convert_to_coco_single_fold/labelme_to_coco.py

@@ -0,0 +1,160 @@
+import argparse
+import os
+import json
+import numpy as np
+import glob
+import cv2
+from sklearn.model_selection import train_test_split
+from labelme import utils
+np.random.seed(41)
+
+# 0为背景
+classname_to_id = {
+    "front": 1, # 从1开始标注
+    "back": 2,
+}
+
+
+class Lableme2CoCo:
+
+    def __init__(self):
+        self.images = []
+        self.annotations = []
+        self.categories = []
+        self.img_id = 0
+        self.ann_id = 0
+
+    def save_coco_json(self, instance, save_path):
+        json.dump(instance, open(save_path, 'w', encoding='utf-8'), ensure_ascii=False, indent=1)  # indent=2 更加美观显示
+
+    # 由json文件构建COCO
+    def to_coco(self, json_path_list):
+        self._init_categories()
+        for json_path in json_path_list:
+            print(json_path)
+            obj = self.read_jsonfile(json_path)
+            self.images.append(self._image(obj, json_path))
+            shapes = obj['shapes']
+            for shape in shapes:
+                annotation = self._annotation(shape)
+                self.annotations.append(annotation)
+                self.ann_id += 1
+            self.img_id += 1
+        instance = {}
+        instance['info'] = 'spytensor created'
+        instance['license'] = ['license']
+        instance['images'] = self.images
+        instance['annotations'] = self.annotations
+        instance['categories'] = self.categories
+        return instance
+
+    # 构建类别
+    def _init_categories(self):
+        for k, v in classname_to_id.items():
+            category = {}
+            category['id'] = v
+            category['name'] = k
+            self.categories.append(category)
+
+    # 构建COCO的image字段
+    def _image(self, obj, path):
+        image = {}
+        img_x = utils.img_b64_to_arr(obj['imageData'])
+        h, w = img_x.shape[:-1]
+        image['height'] = h
+        image['width'] = w
+        image['id'] = self.img_id
+        image['file_name'] = os.path.basename(path).replace(".files", ".jpg")
+        return image
+
+    # 构建COCO的annotation字段
+    def _annotation(self, shape):
+        # print('shape', shape)
+        label = shape['label']
+        points = shape['points']
+        annotation = {}
+        annotation['id'] = self.ann_id
+        annotation['image_id'] = self.img_id
+        annotation['category_id'] = int(classname_to_id[label])
+        annotation['segmentation'] = [np.asarray(points).flatten().tolist()]
+        annotation['bbox'] = self._get_box(points)
+        annotation['iscrowd'] = 0
+        annotation['area'] = 1.0
+        return annotation
+
+    # 读取json文件,返回一个json对象
+    def read_jsonfile(self, path):
+        with open(path, "r", encoding='utf-8') as f:
+            return json.loads(f.read())
+
+    # COCO的格式: [x1,y1,w,h] 对应COCO的bbox格式
+    def _get_box(self, points):
+        min_x = min_y = np.inf
+        max_x = max_y = 0
+        for x, y in points:
+            min_x = min(min_x, x)
+            min_y = min(min_y, y)
+            max_x = max(max_x, x)
+            max_y = max(max_y, y)
+        return [min_x, min_y, max_x - min_x, max_y - min_y]
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--labelme_dir', type=str, default='./images')
+    parser.add_argument('--save_coco_dir', type=str, default='./coco_dataset')
+    args = parser.parse_args()
+    labelme_path = args.labelme_dir
+    saved_coco_path = args.save_coco_dir
+    print('reading...')
+    # 创建文件
+    if not os.path.exists("%scoco/train/" % saved_coco_path):
+        os.makedirs("%scoco/train/" % saved_coco_path)
+    if not os.path.exists("%scoco/train/images/" % saved_coco_path):
+        os.makedirs("%scoco/train/images/" % saved_coco_path)
+    if not os.path.exists("%scoco/eval/" % saved_coco_path):
+        os.makedirs("%scoco/eval/" % saved_coco_path)     
+    if not os.path.exists("%scoco/eval/images/" % saved_coco_path):
+        os.makedirs("%scoco/eval/images/" % saved_coco_path)
+    # 获取images目录下所有的joson文件列表
+    print(labelme_path + "/*.files")
+    json_list_path = glob.glob(labelme_path + "/*.files")
+    print('json_list_path: ', len(json_list_path))
+    # 数据划分,这里没有区分val2017和tran2017目录,所有图片都放在images目录下
+    train_path, val_path = train_test_split(json_list_path, test_size=0.1, train_size=0.9)
+    print("train_n:", len(train_path), 'val_n:', len(val_path))
+
+    # 把训练集转化为COCO的json格式
+    l2c_train = Lableme2CoCo()
+    train_instance = l2c_train.to_coco(train_path)
+    l2c_train.save_coco_json(train_instance, '%scoco/train/annotations.files' % saved_coco_path)
+    for file in train_path:
+        # shutil.copy(file.replace("files", "jpg"), "%scoco/images/train2017/" % saved_coco_path)
+        img_name = file.replace('files', 'jpg')
+        temp_img = cv2.imread(img_name)
+        try:
+            img_name = str(img_name).split('\\')[-1]
+            cv2.imwrite("{}coco/train/images/{}".format(saved_coco_path, img_name.replace('jpg', 'jpg')),temp_img)
+        except Exception as e:
+            print(e)
+            print('Wrong Image:', img_name)
+            continue
+        print(img_name + '-->', img_name.replace('jpg', 'jpg'))
+
+    for file in val_path:
+        # shutil.copy(file.replace("files", "jpg"), "%scoco/images/val2017/" % saved_coco_path)
+        img_name = file.replace('files', 'jpg')
+        temp_img = cv2.imread(img_name)
+        try:
+            img_name = str(img_name).split('\\')[-1]
+            cv2.imwrite("{}coco/eval/images/{}".format(saved_coco_path, img_name.replace('jpg', 'jpg')), temp_img)
+        except Exception as e:
+            print(e)
+            print('Wrong Image:', img_name)
+            continue
+        print(img_name + '-->', img_name.replace('jpg', 'jpg'))
+
+    # 把验证集转化为COCO的json格式
+    l2c_val = Lableme2CoCo()
+    val_instance = l2c_val.to_coco(val_path)
+    l2c_val.save_coco_json(val_instance, '%scoco/eval/annotations.files' % saved_coco_path)

BIN
fonts_images_generate/__pycache__/img_tools.cpython-37.pyc


+ 56 - 0
fonts_images_generate/divide_train_eval.py

@@ -0,0 +1,56 @@
+import argparse
+import os
+import random
+
+
+def get_all_img(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    cnt = 0
+    img_path_list = []
+    for img_fold_in in img_fold_list:
+        img_list1 = []
+        if img_fold_in.endswith('.txt'):
+            continue
+        img_list = os.listdir(os.path.join(img_fold, img_fold_in))
+        for img in img_list:
+            if img.endswith('.txt'):
+                continue
+            img_path = str(img_fold_in) + '/' + str(img)
+            img_list1.append(str(img_path) + ' ' + str(cnt))
+        cnt += 1
+        img_path_list.append(img_list1)
+    return img_path_list
+
+
+def divide(lines, img_folds, train_ratio):
+    fp_val = open(str(img_folds) + '/val_list.txt', 'a')
+    fp_train = open(str(img_folds) + '/train_list.txt', 'a')
+    train_size = 0
+    val_size = 0
+    for line in lines:
+        length = len(line)
+        trainList = random.sample(range(0, length), round(train_ratio * length))
+        train_size += len(trainList)
+        for i in trainList:
+            fp_train.write(line[i] + '\n')
+
+        testList = []
+        for i in range(0, length):
+            if i not in trainList:
+                fp_val.write(line[i] + '\n')
+                testList.append(i)
+        val_size += len(testList)
+    print('train images ', train_size)
+    print('val images ', val_size)
+    fp_val.close()
+    fp_train.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', type=str, default='./font_img_dataset/windows_1/chinese_gray')
+    parser.add_argument('--train_ratio', type=float, default=0.8)
+    args = parser.parse_args()
+    list1 = get_all_img(args.img_dir)
+    divide(list1, args.img_dir, args.train_ratio)

BIN
fonts_images_generate/font/Apple Braille.ttf


+ 63 - 0
fonts_images_generate/generate_imgs_by_fonts.py

@@ -0,0 +1,63 @@
+import argparse
+import datetime
+import os
+import uuid
+from PIL import Image, ImageFont, ImageDraw
+from tqdm import tqdm
+
+
+def create_img(font_path, text, save_path, cnt):
+    fontSize = 25
+    # 字体样式,
+    font = ImageFont.truetype(font_path, fontSize, encoding="gbk")
+    im = Image.new("RGB", (int(font.getbbox(text.rstrip('\n'))[2]), font.getbbox(text)[3]), (255, 255, 255))
+    dr = ImageDraw.Draw(im)
+    # 文字颜色
+    dr.text((0, 0), text, font=font, fill="#000000")
+    time = datetime.datetime.now()
+    t = str(time.year).zfill(4) + '-' + str(time.month).zfill(2) + '-' + str(time.day).zfill(2)
+    if not os.path.exists("%s/%s/" % (save_path, font_path.split('/')[-1][0:-4])):
+        os.makedirs("%s/%s/" % (save_path, font_path.split('/')[-1][0:-4]))
+    # 图片命名
+    img_name = 'kdan_' + t + '_' + str(cnt).zfill(4) + '_' + str(uuid.uuid1())[0:8] + '.jpg'
+    save_path = "%s/%s/" % (save_path, font_path.split('/')[-1][0:-4]) + img_name
+    # 保存
+    im.save(save_path)
+
+
+def get_all_labels(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    # label_list.txt文件
+    fp_label = open(str(img_fold) + '/label_list.txt', 'w')
+    cnt = 0
+    for img_fold_in in img_fold_list:
+        if img_fold_in.endswith('.txt'):
+            continue
+        # 字体名以及对应id写入label_list.txt
+        fp_label.write(str(cnt) + ' ' + str(img_fold_in) + '\n')
+        cnt += 1
+    fp_label.close()
+
+
+def work(font_path, text_path, save_path):
+    fontPath_list = os.listdir(font_path)
+    print('generate images for fonts based on %s' % text_path)
+    for fontPath in tqdm(fontPath_list):
+        with open(text_path, 'r') as fpp:
+            t = fpp.readlines()
+        fpp.close()
+        cnt = 0
+        for text in t:
+            create_img(str(font_path) + '/' + str(fontPath), text, save_path, cnt)
+            cnt += 1
+    get_all_labels(save_path)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--font_dir', type=str, default='./font')
+    parser.add_argument('--text_path', type=str, default='./text/eng_text_test.txt')
+    parser.add_argument('--save_dir', type=str, default='./images')
+    args = parser.parse_args()
+    work(args.font_dir, args.text_path, args.save_dir)

+ 24 - 0
fonts_images_generate/get_all_labels.py

@@ -0,0 +1,24 @@
+import argparse
+import os
+
+
+def get_all_labels(img_fold):
+    # 获取字体文件夹列表
+    img_fold_list = os.listdir(img_fold)
+    # label_list.txt文件
+    fp_label = open(str(img_fold) + '/label_list.txt', 'w')
+    cnt = 0
+    for img_fold_in in img_fold_list:
+        if img_fold_in.endswith('.txt'):
+            continue
+        # 字体名以及对应id写入label_list.txt
+        fp_label.write(str(cnt) + ' ' + str(img_fold_in) + '\n')
+        cnt += 1
+    fp_label.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--save_dir', type=str, default=r'C:\Users\KDAN\Desktop\workspace\字体分类数据集\windows_1\font_test_dataset')
+    args = parser.parse_args()
+    get_all_labels(args.save_dir)

+ 31 - 0
fonts_images_generate/gray.py

@@ -0,0 +1,31 @@
+import cv2
+import os
+import glob
+# 根目录路径
+root_path = r'C:\Users\KDAN\Desktop\work\exchange\font_style_classification\Fonts_img_Dataset\windows_1\chinese'
+save_path = r'C:\Users\KDAN\Desktop\work\exchange\font_style_classification\Fonts_img_Dataset\windows_1\chinese_gray'
+
+list_path = os.listdir(root_path)
+
+for k, names in enumerate(list_path):
+    print(k, names)
+    # if names.endswith('.txt'):
+    #     continue
+    # if not os.path.exists(save_path + '\\' + names):
+    #     os.makedirs(save_path + '\\' + names)
+    bmp_path = glob.glob(r'C:\Users\KDAN\Desktop\work\exchange\font_style_classification\Fonts_img_Dataset\windows_1\chinese\{}\*jpg'.format(names))
+    for i, path_bmp in enumerate(bmp_path):
+        if path_bmp.endswith('.txt'):
+            continue
+        # print(path_bmp)
+        a = os.path.split(path_bmp)
+        name = os.path.basename(a[0])
+        # print(name)
+        img = cv2.imread(path_bmp)
+        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+        # cv2.imshow('figure', gray)
+        # cv2.waitKey(0)
+        cv2.imwrite('C:/Users/KDAN/Desktop/work/exchange/font_style_classification/Fonts_img_Dataset/windows_1/chinese_gray/{}/{}_{}.jpg'.format(names, name, i), gray)
+
+
+

BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0000_ffec5264.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0001_fff5a1cc.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0002_fffd428a.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0003_00053198.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0004_000d6f00.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0005_00151406.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0006_001d0300.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0007_0024a1ee.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0008_002c1b36.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0009_00339894.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0010_003a7642.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0011_0041544c.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0012_0047dfec.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0013_004e9742.jpg


BIN
fonts_images_generate/images/PingFang/kdan_2022-10-25_0014_00555134.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0000_00b5c174.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0001_00ba9f66.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0002_00bf80e8.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0003_00c54d66.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0004_00ca7dee.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0005_00cf8668.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0006_00d4ba3e.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0007_00d9c15a.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0008_00de7eb0.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0009_00e36088.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0010_00e7f466.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0011_00ecd668.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0012_00f20290.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0013_00f70c2e.jpg


BIN
fonts_images_generate/images/STHeiti Light/kdan_2022-10-25_0014_00fbee28.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0000_010144c8.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0001_01064e66.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0002_010b7e02.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0003_0110ae14.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0004_0116054c.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0005_011b0e52.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0006_012128dc.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0007_01265900.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0008_012b6290.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0009_01306ba8.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0010_013574ee.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0011_013a7dca.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0012_013fad64.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0013_01448f62.jpg


BIN
fonts_images_generate/images/STHeiti Medium/kdan_2022-10-25_0014_0149986c.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0000_005c79d8.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0001_00627094.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0002_006815f4.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0003_006e7aa8.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0004_00750e7e.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0005_007b2598.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0006_0081910a.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0007_00875db8.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0008_008d26c6.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0009_00931de2.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0010_0098ea08.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0011_009edd3a.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0012_00a4cd40.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0013_00aac0e6.jpg


BIN
fonts_images_generate/images/Songti/kdan_2022-10-25_0014_00b08d5a.jpg


+ 5 - 0
fonts_images_generate/images/label_list.txt

@@ -0,0 +1,5 @@
+0 Apple Braille
+1 PingFang
+2 Songti
+3 STHeiti Light
+4 STHeiti Medium

+ 15 - 0
fonts_images_generate/img_paddling.py

@@ -0,0 +1,15 @@
+# import os
+import img_tools
+# from tqdm import tqdm
+#
+#
+# img_folds = os.listdir('./font_img_dataset/windows_1/english')
+# for img_fold in tqdm(img_folds):
+#     img_fold = str('font_img_dataset/windows_1/english') + '/' + str(img_fold)
+#     if not img_fold.endswith('.txt'):
+#         img_path_list = os.listdir(img_fold)
+#         for img_path in img_path_list:
+#             img_tools.img_resize(img_fold + '/' + str(img_path))
+
+path = r"C:\Users\KDAN\Desktop\images\字体\153409.jpg"
+img_tools.img_resize(path)

+ 22 - 0
fonts_images_generate/img_tools.py

@@ -0,0 +1,22 @@
+import cv2
+
+
+def img_resize(img_path):
+    img = cv2.imread(img_path)
+    width = img.shape[1]
+    height = img.shape[0]
+    if height != 48:
+        ratio = 35 / height
+        height = 35
+        width = int(width * ratio)
+        dim = (width, height)
+        resized = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
+        width = resized.shape[1]
+        height = resized.shape[0]
+        if width < 960:
+            left = int((960 - width) / 2)
+            top = int((48 - height) / 2)
+            ret = cv2.copyMakeBorder(resized, top, 48 - height - top, left, 960 - width - left, cv2.BORDER_CONSTANT,
+                                     value=(255, 255, 255))
+            cv2.imwrite(img_path, ret)
+

Datei-Diff unterdrückt, da er zu groß ist
+ 7560 - 0
fonts_images_generate/text/chn_text.txt


+ 14 - 0
fonts_images_generate/text/chn_text_test.txt

@@ -0,0 +1,14 @@
+本节内容可能不会
+很长,但是
+还是希望尽可能把这个环节重要的骨架勾勒出来。
+有一
+个经典的问题是:“如果你是一个投资人,要投资
+一个项目,核心是看什么?项目还是团队?”。与之对应的一个问题
+是:“如果你是一位创业者,创业的
+基石是一个独特的项目还是一个优秀的团队?”
+当然这种二选一的问题往往都只强调了某一个方面,并
+没有标准答案,有的人会选项目,有的人会选人(团队)。本节要讨论的
+是在一个企业里面“如何构建一个有效的能做产品的团队所需要的不同角色的人”,这
+个问题。
+顺带的,以我当前所在的工作环境,这个问题天然地穿插了一个“远程
+工作”的上下文,它不是主要的问题,但是在里面是一个因素。

Datei-Diff unterdrückt, da er zu groß ist
+ 2160 - 0
fonts_images_generate/text/eng_text.txt


+ 15 - 0
fonts_images_generate/text/eng_text_test.txt

@@ -0,0 +1,15 @@
+Feature pyramids are a basic component in recognition
+systems for detecting objects at different scales.
+But recent
+deep learning object detectors have avoided pyramid representations,
+in part because they are compute and memory
+intensive. In this paper, we exploit the inherent multi-scale,
+pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost.
+A topdown architecture with lateral connections is developed
+for
+building high-level semantic feature maps at all scales. T
+his
+architecture, called a Feature Pyramid Network (FPN),
+shows significant
+improvement as a generic
+feature extractor in several applications.

Datei-Diff unterdrückt, da er zu groß ist
+ 114494 - 0
get_name_by_spider/addr.txt


+ 9 - 0
get_name_by_spider/get_dict.py

@@ -0,0 +1,9 @@
+fp_name = open('name.txt', 'r', encoding='utf-8')
+str = ''
+for line in fp_name.readlines():
+    str += line.rstrip('\n')
+fp_name.close()
+print(list(str))
+print(len(list(str)))
+print(sorted(set(list(str))))
+print(len(set(list(str))))

+ 22 - 0
get_name_by_spider/get_name.py

@@ -0,0 +1,22 @@
+import requests  # 导包
+import re
+from tqdm import tqdm
+
+pattern1 = r'<tr><td>(.*?)</td><td>'
+pattern2 = r'[0-9]</td><td>(.*?)</td></tr><tr><td>'
+n = 5000
+fp_name = open('name.txt', 'a+', encoding='utf-8')
+fp_addr = open('addr.txt', 'a+', encoding='utf-8')
+for i in tqdm(range(0, n)):
+    url = 'https://www.myfakeinfo.com/nationalidno/get-chinataiwan-ic-numberandname.php'
+    header = {
+        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/"
+                      "537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"}
+    response = requests.get(url, headers=header)  # 模拟 get 请求
+    response.encoding = 'utf-8'  # 指定编码
+    name_list = re.findall(pattern1, response.text)
+    for name in name_list:
+        fp_name.write(name + '\n')
+    addr_list = re.findall(pattern2, response.text)
+    for addr in addr_list:
+        fp_addr.write(addr[11:] + '\n')

Datei-Diff unterdrückt, da er zu groß ist
+ 120520 - 0
get_name_by_spider/name.txt


+ 0 - 0
idcardgenerator/OpticalBBold.otf


Einige Dateien werden nicht angezeigt, da zu viele Dateien in diesem Diff geändert wurden.