KELE++

关于R36S游戏掌机的一切

Tue, 23 Jun 2026 22:22:35 +0800

最近入手了一部寨版R36S掌机，通过几天的折腾，还是建议大家买正版。本文结合handhelds.wiki中的内容，进行了一些翻译和总结，并对一些资源做了统一的整理。

EmuELEC 克隆版

如何辨别山寨机

已知的软硬件差异及识别EmuELEC克隆变种的方法：

多数克隆版无需SD卡即可启动 —— 会显示错误信息“We can’t find any systems!”
仅有一颗内存芯片（若为透明外壳的R36S，无需拆机即可观察）
注意：新版克隆机（G80D主板）已改为与原版相同的双内存芯片
部分克隆版内存仅512MB
无法运行标准ArkOS/AmberELEC —— 黑屏并伴随红灯闪烁
SD卡内的dtb文件不同，常见克隆版文件包括：
rk3326-evb-lp3-v12-linux.dtb
rf3536k4ka.dtb
rf3536k3ka.dtb
部分克隆版搭载EmuELEC ES V4.7系统（与Gaminja/Kinhank K36同款固件）
分区命名差异：克隆版可能显示为EMUELEC或EEROMS而非原版的EASYROMS
启动时缺少ArkOS 2.0版本日期提示（部分克隆版）
关机状态下显示充电动画（部分克隆版）
异常启动流程（部分克隆版）
无WiFi功能 —— 部分克隆版不支持WiFi外接设备
FN键被映射到Y键（可能是软件问题？）
音量孔差异：原版孔洞更大更厚，克隆版通常较细密
按键字体异常：如X键字母歪斜/细瘦（部分克隆版）
系统内存显示异常：在RetroArch系统信息中显示为497MB（路径：retroarch > 返回主菜单 > 信息 > 系统）

带内置存储的EmuELEC克隆变种

系统安装在板载存储（eMMC）中
不兼容K36或克隆版ArkOS固件镜像
采用性能较弱的RK3128芯片（非原版RK3326）
游戏添加方法
R36S内置文件（核心文件、游戏列表）

为什么克隆机不好

多数克隆固件仍存在双SD卡兼容问题
大量用户反馈音频输出异常和按键映射颠倒
部分克隆机内存缩水（512MB而非1GB）
部分克隆机采用性能更弱的RK3128芯片（非原版RK3326）
存在内部硬件完全不同的克隆机（性能比RK3128还差）且完全不支持自制固件
许多用户反映G80主板克隆机难以刷入第三方固件
若丢失SD卡，需尝试多达10种不同的屏幕驱动文件（dtb）才能恢复

EmuELEC克隆版刷机固件

ArkOS

项目	详情
ArkOS for EmuELEC克隆设备
描述	由AeolusUX维护的ArkOS社区版，适用于K36掌机及同类克隆设备(R36S克隆版/K36/R36 Pro/R36 Max/U8掌机/RX6H)
最新版本
下载链接	• GitHub主镜像 • DTB文件库
使用说明	• 查看Reddit讨论帖 • 阅读GitHub发布说明 ⚠️ 不建议使用双SD卡配置

黑屏/无声问题修复与dtb文件

项目	说明
问题描述	刷入自定义固件后若出现异常，需更换SD卡BOOT分区的这些文件。若无声，请进入Ports运行"audio fix permanent"
解决方案	1. ⬇️ K36及同类克隆设备专用dtb文件库 2. 按顺序尝试不同dtb文件
替代音频修复方案	⬇️ 按dtb文件名下载对应修复包

操作须知：

每个dtb文件需单独测试效果

音频修复文件需与当前dtb文件匹配使用

建议在操作前备份原始文件

ROCKNIX

官方ROCKNIX现已支持克隆设备，请使用下载页面中标注 “b"版本 的固件。

项目	说明
固件说明	官方ROCKNIX现已支持克隆设备，请使用下载页面中带"b"标记的版本
下载链接	• dtb文件修补工具 • 夜间构建版下载
文件命名示例	`ROCKNIX-RK3326.aarch64-20250430-b.img.gz`
使用指南	1. 下载带"b"后缀的固件 2. 使用工具修改dtb文件 3. 参考ROCKNIX Wiki

注意事项：

仅适用于RK3326等特定克隆设备

“b"版本修复了以下问题：

内存识别异常

按键映射错误

双SD卡槽兼容性

建议使用Etcher工具刷写固件

UnofficialOS

项目	说明
固件说明	专为R36S克隆设备定制的非官方固件
下载链接	⬇️ GitHub发布页
使用教程	克隆设备与R3xS系列安装指南

版本特性：

优化克隆设备性能表现

修复常见兼容性问题

提供定制化功能选项

RetrOS

项目	说明
固件说明	专为R36S克隆设备设计的混合型定制固件
下载链接	⬇️ GitHub仓库 - ⬇️ MEGA网盘
版本信息	RetrOS-preview1.img (早期开发测试版)

固件特点：

专为克隆设备优化的混合架构

提供更好的硬件兼容性

早期预览版包含基础功能

总结

上面的四个系统全部测试过，都可以在我的寨版上运行。且支持Ports移植游戏。

我的版本

我的是G80C主板版本，具体型号是G80CA-MB V1.2-20250422，首次发布在2025年5月，无eMMC存储的EmuELEC克隆版本。该设备应兼容Panel 8或Panel 9的dtb文件。（备选下载链接）
Reddit讨论帖

游戏等其他资源

各外贸系统原装ROM：百度网盘
arkos通用魔改补丁链接：百度网盘提取码arko
游戏包合集：百度网盘
世嘉DC游戏1257款：百度网盘
右手游戏资源：百度网盘
windstarry大佬移植的ports游戏：百度网盘
寨中寨EE4.7通用游戏包128G-2025：百度网盘
游戏资源下载地址：www.oldmantvg.net/
主题资源下载地址：https://handhelds.wiki/R36S_Clones
各种寨机型号整合包合集：百度网盘

从零构建：途虎养车数据爬虫实战指南

Fri, 08 Aug 2025 16:30:49 +0800

本文将以途虎养车平台为例，搭建一个高效、稳定的爬虫系统。内容涵盖从需求分析、技术选型到反爬策略应对的全流程，并提供可复用的代码示例与架构设计思路。

前言

最近，由于需要了解汽车滤芯更换套餐的市场行情，我决定通过爬虫技术从途虎养车平台采集相关数据。选择途虎养车作为目标平台主要有两个原因：

行业地位：途虎养车是国内领先的汽车后市场服务平台，拥有丰富的商品和服务数据，具有较高的参考价值。
技术可行性：与其他完全依赖APP的汽车服务平台不同，途虎仍然维护了网页端服务，使得数据抓取成为可能。

在调研过程中，我发现途虎的网页端仍然可用：途虎养车网页版，这为后续的爬虫开发提供了便利。

链接抓取

为了分析途虎养车的数据请求，我们可以使用浏览器的开发者工具进行抓包。首先在Chrome或Firefox中打开途虎养车网页端（途虎养车手机版），按下F12键调出开发者工具，切换到Network面板（网络）并确保选中All选项。

在正式抓取数据前，我们需要先完成几个关键操作步骤。首先登录途虎养车账号，在个人中心选择对应的车辆信息，然后进入"保养"服务页面。待页面完全加载后，系统会展示各类保养套餐信息。此时开发者工具已经记录了所有网络请求，我们需要特别关注那些返回JSON格式数据的API接口，这些往往包含我们需要的核心信息。

对于初学者来说，可以通过逐个查看接口的响应内容来定位目标数据。以当前页面为例，在Network面板中筛选请求后，可以清楚地看到返回的JSON数据中包含了我们所需的保养套餐详情，包括项目名称、价格、维护项目以及使用产品等关键字段。这个发现过程为后续编写爬虫代码提供了明确的数据接口定位。

我们可以通过以下方式获取目标链接：在开发者工具的Network面板中，找到包含所需数据的请求后，直接双击该请求行，浏览器会自动在新标签页中打开该请求的响应内容。此时地址栏显示的URL就是我们需要的目标API链接，这个链接将作为后续爬虫程序直接请求的数据接口。

链接分析

我们获得了以下链接：

https://maint-api.tuhu.cn/apinew/GetBaoYangAppPackages?channel=kH5&activityId=&city=上海市&province=上海市&lngBegin=119.80062593300364&latBegin=33.14440913557059&vehicle={"CarId":"ba12c20d-9256-4117-bade-47d47b68e822","PaiLiang":"电动","OnRoadTime":"","VehicleId":"VE-TSLY","tireSize":"","Properties":[],"Nian":"2025","Distance":0,"Tid":"160821"}&baoYangTypes=&isDefaultExpand=true&userId=0bdf60a3-ff01-459d-8621-de160851eed8&productIds=

我们需要深入解析这个API链接的结构，以确定哪些参数是必须的、哪些是可选的，从而构建出可批量爬取的请求链接。通过拆解这个URL，我们可以清晰地看到途虎养车API的参数设计逻辑：

核心参数分类说明

基础定位参数
- city/province：省市信息（需URL编码）
- lngBegin/latBegin：经纬度坐标（支持6位小数）
- 示例：city=上海 → city=%E4%B8%8A%E6%B5%B7

车辆身份参数

{
  "CarId": "ba12c20d-...",  // 车辆唯一标识
  "VehicleId": "VE-TSLY",    // 车型编码
  "PaiLiang": "电动",        // 动力类型
  "Nian": "2025"            // 年款
}

用户会话参数
- userId：用户唯一标识（32位UUID格式）
- channel：渠道标识（kH5表示H5页面）
业务筛选参数
- baoYangTypes：保养类型过滤
- productIds：指定商品ID查询

首先观察基础URL部分：https://maint-api.tuhu.cn/apinew/GetBaoYangAppPackages，这是所有请求的入口端点。紧随其后的问号表示开始查询参数，这些参数使用标准的key=value格式，以&符号分隔。在这些参数中，有些是必填的核心参数，有些则是可选的辅助参数。

必填参数包括：

地理位置参数：city和province需要填写具体的省市名称，建议使用URL编码格式
车辆标识参数：vehicle是一个JSON字符串，必须包含有效的CarId等车辆信息
用户标识：userId虽然是UUID格式，但在未登录状态下可以使用默认值

可选参数包括：

坐标参数：lngBegin和latBegin可以留空或使用默认值
筛选参数：baoYangTypes和productIds可以留空获取全部结果
活动参数：activityId通常可以留空

特别需要注意的是channel=kH5这个参数，它标识请求来源，保持这个值可以避免一些反爬检测。在实际批量爬取时，我们主要需要动态替换的参数是city、province和vehicle中的车辆信息，其他参数可以保持固定值。通过这种参数分析，我们就可以设计出可批量请求的URL模板，只需替换关键参数即可获取不同地区、不同车型的保养套餐数据。

必要参数获取

这一步将使用Python去获得目标参数，并解决途虎平台"最多只能添加5辆车"的限制问题。我们需要做以下几步：

获取途虎养车的品牌、车型、排量等基础数据
绕过5辆车限制实现大批量数据爬取
构建有效的API请求
处理和存储爬取结果
爬取流程如下：

flowchart TD
    A([开始]) --> B[加载配置参数]
    B --> C{输入文件存在?}
    C -->|否| D[输出错误]
    C -->|是| E[读取数据]
    E --> F[加载车辆ID]
    F --> G[遍历车型]
    G --> H{已爬取?}
    H -->|是| G
    H -->|否| I[提取信息]
    I --> J[生成链接]
    J --> K{成功?}
    K -->|否| L[记录错误]
    K -->|是| M[请求数据]
    M --> N{成功?}
    N -->|否| L
    N -->|是| O[解析数据]
    O --> P[提取信息]
    P --> Q[保存结果]
    Q --> R{还有车型?}
    R -->|是| G
    R -->|否| S[输出完成]
    S --> T([结束])

环境准备

安装必要库

pip install requests

核心代码解析

配置参数

# 基本配置
USER_ID = "0bdf60a3-ff01-459d-8621-de160851eed8"  # 用户ID
CITY = "上海市"  # 默认城市
PROVINCE = "上海市"  # 默认省份

# 请求控制
RETRY_COUNT = 3  # 请求重试次数
DELAY_BETWEEN_REQUESTS = 3  # 请求间隔时间(秒)

请求头设置

headers = {
    "User-Agent": "Mozilla/5.0...",
    "Content-Type": "application/json",
    "Authorization": "Bearer 204576661ec744519fcd3c2714a850ad",
    # 其他必要headers...
}

请求头在爬虫中的作用至关重要，它就像是网络请求的身份证和通行证，决定了服务器是否会接受并响应你的请求。在途虎养车这样的平台爬取数据时，请求头不仅需要包含基本的身份标识信息，还需要模拟真实用户的行为特征，以避免被识别为自动化程序而遭到拦截。一个典型的请求头会包含User-Agent来伪装成普通浏览器，携带Authorization和Cookie来维持登录状态，设置Content-Type来声明数据格式，还可能包含一些平台特定的验证字段如blackbox或TongDun-TokenId等反爬机制相关的信息。这些头部信息共同构成了一个完整的请求身份，缺少任何一个关键字段都可能导致请求失败。

车辆管理功能

获取当前车辆列表

def get_car_list() -> List[Dict]:
    """获取当前账户下的车辆列表"""
    url = "https://cl-gateway.tuhu.cn/cl-user-info-site/myCar/getCarListByUserId"
    payload = {"userId": USER_ID}
    
    response = requests.post(url, headers=headers, json=payload)
    if response.json().get("code") == 10000:
        return response.json().get("data", [])
    return []

这里的url也需要通过上面的F12开发者模式的网络请求中获取，即调用哪个api才能获取汽车列表。我们在汽车列表页面对网络请求进行抓取就会获得以上的api,通过发送我们的userID即可获得到所有的汽车品牌。

删除指定车辆

因为途虎只允许未认证的情况下添加5辆汽车，限制了我们爬取全部汽车的信息，这个时候我们就需要在获得一辆汽车的信息后删除这辆车，反复执行添加删除车辆，直到获得所有车的信息。

::: tip 提示
这里的url是途虎的车辆删除端口，通过F12抓取，发送carID和userID就可以删除你的汽车。
:::

def delete_car(car_id: str) -> bool:
    """删除指定车辆"""
    url = "https://cl-gateway.tuhu.cn/cl-user-info-site/myCar/removeCar"
    payload = {"carId": car_id, "userId": USER_ID}
    
    response = requests.post(url, headers=headers, json=payload)
    return response.json().get("code") == 10000

添加新车

通过添加新车，我们可以获得汽车的carId，我们需要给url发送payload中的信息就可以获取：

def add_vehicle(model_code: str, year: int, tid: int) -> Optional[str]:
    """添加新车并返回carId"""
    url = "https://cl-gateway.tuhu.cn/cl-user-info-site/myCar/addCar"
    payload = {
        "modelCode": model_code,
        "productionYear": year,
        "tid": tid,
        "source": "tuhu_wap"
    }
    
    response = requests.post(url, headers=headers, json=payload)
    if response.json().get("code") == 10000:
        return response.json().get("data", {}).get("carId")
    return None

数据爬取功能

构建养护查询链接

def build_maintenance_link(vehicle_json: str) -> str:
    """构建养护查询完整链接"""
    base_url = "https://maint-api.tuhu.cn/apinew/GetAppFirstPageExternalData"
    params = {
        "channel": "kH5",
        "city": CITY,
        "province": PROVINCE,
        "vehicle": requests.utils.quote(vehicle_json)
    }
    return f"{base_url}?{urllib.parse.urlencode(params)}"

执行数据爬取

def crawl_maintenance(link: str) -> Optional[Dict]:
    """爬取养护数据"""
    try:
        response = requests.get(link, headers=headers, timeout=15)
        return response.json().get("Categories", [])
    except Exception as e:
        print(f"爬取失败: {str(e)}")
        return None

5辆车限制解决方案

解决方案设计

graph TD
    A[开始] --> B[获取当前车辆列表]
    B --> C{车辆数≥5?}
    C -->|是| D[删除最早添加的车辆]
    C -->|否| E[添加新车]
    D --> E
    E --> F[爬取数据]
    F --> G[保存结果]
    G --> H[删除刚添加的车辆]
    H --> I{所有车型完成?}
    I -->|否| B
    I -->|是| J[结束]

核心实现代码

def process_vehicle(model_info: Dict) -> Optional[Dict]:
    """处理单个车型的爬取流程"""
    # 检查并管理车辆
    while True:
        current_vehicles = get_car_list()
        if len(current_vehicles) < 5:
            break
        # 删除最早添加的车辆
        oldest = sorted(current_vehicles, key=lambda x: x["addTime"])[0]
        if not delete_car(oldest["carId"]):
            return None
        time.sleep(2)
    
    # 添加新车
    car_id = add_vehicle(
        model_info["model_code"],
        model_info["year"],
        model_info["tid"]
    )
    if not car_id:
        return None
    
    # 构建并爬取链接
    vehicle_json = build_vehicle_json(car_id, model_info)
    link = build_maintenance_link(vehicle_json)
    data = crawl_maintenance(link)
    
    # 立即删除车辆
    delete_car(car_id)
    
    return {
        "model_info": model_info,
        "data": data,
        "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    }

完整工作流程

主函数实现

def main():
    # 1. 获取所有品牌
    brands = get_brands()
    
    # 2. 遍历品牌获取车型
    for brand in brands[:2]:  # 测试时限制品牌数量
        models = get_models(brand["name"])
        
        # 3. 处理每个车型
        for model in models[:3]:  # 测试时限制车型数量
            displacements = get_displacements(model["code"])
            
            for disp in displacements[:2]:  # 测试时限制排量数量
                years = get_product_years(model["code"], disp)
                
                for year in years[:1]:  # 测试时限制年份数量
                    # 4. 获取tid并处理车辆
                    vehicle_info = get_vehicle_tid(model["code"], disp, year)
                    if vehicle_info:
                        result = process_vehicle({
                            "brand": brand["name"],
                            "model_name": model["displayName"],
                            "model_code": model["code"],
                            "displacement": disp,
                            "year": year,
                            "tid": vehicle_info["tid"]
                        })
                        
                        # 5. 保存结果
                        if result:
                            save_result(result)
                            time.sleep(DELAY_BETWEEN_REQUESTS)

运行示例

python main.py

输出示例：

开始处理品牌: 大众
获取到车型: 高尔夫 (code: VW-GOLF)
处理排量: 1.4T
处理年份: 2022
成功添加车辆: VW123
爬取数据完成
删除车辆: VW123
保存结果成功
...

最终信息爬取

在数据功能爬取阶段，我们成功爬取了想要的参数，参数结果示例如下所示：

"奥迪": [  
  {  
    "model_name": "奥迪 A3",  
    "model_code": "VE-AADA3AD",  
    "factory": "一汽大众奥迪",  
    "displacement": "1.4T(35TFSI)",  
    "production_year": 2025,  
    "tid": "157761",  
    "car_id": "3B1BD673-BEE9-45CC-8B57-036275D53EFA",  
    "vehicle_json": "{\"CarId\": \"3B1BD673-BEE9-45CC-8B57-036275D53EFA\", \"PaiLiang\": \"1.4T(35TFSI)\", \"OnRoadTime\": \"\", \"VehicleId\": \"VE-AADA3AD\", \"tireSize\": \"\", \"Properties\": [], \"Nian\": \"2025\", \"Distance\": 0, \"Tid\": \"157761\"}",   
    "crawl_time": "2025-08-05 22:41:51",  
    "status": "完整",  
    "car_details": null  
  }
  ]

其实已经结束了，我们只需要把CarId,model_code,PaiLiang必须信息填入之前的链接就可以获得套餐了，读取json文件填入就像，批量获取链接里面的内容，这里不多赘述，但是要控制爬的时间，一般随机1-3秒即可，实践证明太快会导致途虎封ip。

基于摄影测量的跨房间点云配准与重建精度研究

Sat, 26 Jul 2025 00:07:40 +0800

摄影测量技术在单体房间三维重建中表现良好，然而实际建筑环境多为多房间连通结构。本文旨在探究摄影测量方法在跨房间三维重建中的性能表现，重点分析其重建房间连接处的效果。

基于固定照片

设备及工具

设备：魅族21NOTE(谷歌相机)
软件：MetaShape

实验数据采集包含厨房和客厅两个连通空间，共获取284张多视角高清图像。通过摄影测量算法处理，成功重建了场景的三维点云模型，以下是采集数据的部分截图。

重建效果

实验结果表明，当前摄影测量方法在厨房与客厅的空间衔接上存在明显不足。从结果可以看出，由于特
征点匹配失败，两个功能空间未能正确重建其连接关系。根据建筑平面图比对，厨房入口（对应客厅’蜡笔小新’门帘位置）未能实现准确重建，这表明该方法在跨空间三维重建方面仍需改进。

定位辅助

在尝试通过为每组拍摄照片标注位置信息来实现跨房间三维重建的粗配准时，我们发现基于GPS的定位方案存在显著局限性。理论上，利用相同定位标签的空间对应关系可以为点云重建提供初始对齐依据，且普通GPS在开放空间的定位精度（约2-3米误差）对于房间级重建尚可接受。然而实际测试表明，室内环境的GPS信号质量存在严重衰减问题：在有窗房间中，距离窗口仅2米处的信号强度已出现明显下降；而在无窗的低楼层室内环境，GPS信号接收成功率不足10%，基本无法实现有效定位。如图所示，封闭空间与开放区域的信号强度对比差异显著，这直接影响了基于位置标签的配准方案在室内重建中的实际应用价值。该现象提示我们，在室内三维重建任务中需要探索不依赖GPS信号的替代方案，如视觉标记辅助或惯性导航融合等方法。

在缺乏室内蓝牙/WiFi定位系统支持的情况下，本研究暂时无法深入探索基于无线信号的室内定位方案。然而，在GPS信号良好的半开放环境中（如临近窗户区域或低层建筑阳台），通过为采集图像添加位置标签来实现粗配准的思路具有理论可行性。

基于视频截取

针对当前基于照片的三维重建方法中存在的图像数量不足及连续性较差等问题，现在提出了一种基于视
频关键帧提取的改进方案。通过理论分析和实验验证，我们发现传统静态图像采集方式主要存在以下两
个技术瓶颈：

样本数量限制：静态照片采集效率低下，难以获取足够数量的高质量样本；
时序连续性缺失：离散拍摄导致帧间关联信息丢失，影响重建精度。

为解决上述问题，本研究创新性地采用视频录制结合关键帧提取的技术路线。相较于静态拍摄，视频采
集具有以下优势：

可实现每秒30帧以上的连续图像捕获
保证帧间运动参数的连续性
显著提升数据采集效率
在硬件选择方面，经过多维度评估（包括分辨率、动态范围、色彩还原度等指标），最终选用Pocket3作为视频采集设备。该设备具备4K/60fps的视频录制能力，其1英寸大底CMOS传感器可提供优异的低光照性能，这些特性为后续三维重建提供了高质量的原始数据基础。

使用工具

设备：DJI Pocket3
软件：自己写的帧截取工具，MetaShape。

实验流程

视频拍摄

本研究采用手持Pocket3设备进行室内场景视频采集，按照以下路径实现场景全覆盖。采集过程中保持设备高度1.7米（模拟人眼视角，部分死角采用上下扫描），以0.5m/s的匀速沿顺时针方向移动，确保每个墙面获得至少3秒的连续拍摄。设备参数设置为4K UHD分辨率（3840×2160）和60fps帧率，配合全向防抖功能保证画面稳定性，同时采用自动白平衡和曝光模式以适应室内光照变化。

图片截取

本文设计并实现了一个基于Python的视频帧提取工具，其核心功能模块采用OpenCV计算机视觉库进行视频解码与图像处理。该工具的主要算法流程如下：

视频文件解析：通过OpenCV的VideoCapture接口读取视频文件，获取视频总帧数、分辨率等元数据信息：

cap = cv2.VideoCapture(video_path)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

帧采样算法：采用等间隔采样策略提取关键帧，支持用户自定义采样间隔（1-100帧可调）：

if frame_count % frame_interval == 0:
output_path = os.path.join(output_folder, f"{saved_count + 1:04d}.png")
cv2.imwrite(output_path, frame)

多线程处理架构：为避免界面卡顿，采用生产者-消费者模型，将耗时的帧提取任务放在后台线程执行：

Thread(target=self.extract_frames,
	args=(video_path, output_path, frame_interval),
	daemon=True).start()

输出文件管理：自动创建目标目录，并按0001.png、0002.png等四位编号格式保存提取的帧图
像，确保文件序列的规范性和可追溯性。
当然，为了方便操作，最后为他做了一个简单的图形界面，如下图：

针对原始视频数据（总帧数13,046帧，时长544秒@60fps），基于帧间连续性分析采用每20帧提取1帧
的采样方案，最终获得653张关键帧图像，采样率为1.5%，相邻采样帧时间间隔约0.67秒。采样后的图
像数据集具有3840×2160的高分辨率。

重建结果

实验结果表明，基于视频帧提取的三维重建方法在空间位置还原方面表现良好。充分验证了视频帧采样
策略的有效性。

进阶探索

COLMAP（Computational Lightmapping）是一款开源的运动恢复结构（Structure-from-Motion,
SfM）和多视图立体（Multi-View Stereo, MVS）三维重建工具包，由瑞士苏黎世联邦理工学院（ETH Zurich）开发。它能够从一组无序的二维照片中自动重建出场景的三维几何结构和相机位姿，广泛应用于摄影测量、计算机视觉、虚拟现实、文化遗产数字化等领域。下面我将使用COLMP进行重建。

COLMAP重建

前期准备

首先使用上面的帧截取工具对视频每一帧进行截取，得到全视频的数据集。

筛选最优帧（关键帧选择）

直接使用所有帧会导致 数据冗余 和 计算量过大，COLMAP 提供了关键帧选择方法：

colmap feature_extractor --database_path database.db --image_path frames/
colmap exhaustive_matcher --database_path database.db
colmap mapper --database_path database.db --image_path frames/ --output_path
sparse/

feature_extractor ：提取图像特征（SIFT/SURF）
exhaustive_matcher ：匹配特征点
mapper ：自动选择匹配度高的关键帧，丢弃模糊/低质量帧

运行 COLMAP 三维重建

稀疏重建（SfM）

colmap feature_extractor --database_path database.db --image_path
selected_frames/
colmap exhaustive_matcher --database_path database.db
colmap mapper --database_path database.db --image_path selected_frames/ --
output_path sparse/

selected_frames/ ：存放筛选后的关键帧
sparse/ ：输出稀疏点云（ .bin 或 .txt 格式）

稠密重建（MVS）

colmap image_undistorter --image_path selected_frames/ --input_path sparse/0 --
output_path dense/
colmap patch_match_stereo --workspace_path dense/
colmap stereo_fusion --workspace_path dense/ --output_path dense/fused.ply

patch_match_stereo ：生成深度图（需GPU加速）
stereo_fusion ：融合深度图，输出稠密点云（.ply）

重建结果

实验结果表明，基于视频帧提取的三维重建取得了显著成效。从重建结果来看，生成的点云模型不仅空
间位置准确，几何结构完整，在细节表现方面也展现出明显优势。通过优化帧选择策略，我们获得了比
原始视频采样更多的高质量图像（1500张）

3DGS（3D Gaussian Splatting)

为了更清晰还原细节，实验尝试了3DGS重建。

数据准备

在上面COLMAP过程中，可以输出以下关键文件：

sparse/0/cameras.bin - 相机参数
sparse/0/images.bin - 相机位姿
sparse/0/points3D.bin - 稀疏点云
dense/fused.ply - （可选）稠密点云
我们需要对格式进行转换，转化的命令如下：

python convert.py \
--colmap_path ./sparse/0 \
--images_path ./images \
--output_path ./gs_data

生成结构：

gs_data/
├── cameras.json
 # 相机参数
├── points3D.ply
 # 初始高斯中心
└── images/
 #
 undistorted images

3DGS训练阶段

环境配置

conda create -n gs python=3.10
conda activate gs
pip install torch torchvision torchaudio --index-url
https://download.pytorch.org/whl/cu118
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
cd gaussian-splatting
pip install -r requirements.txt

启动训练

python train.py \
-s ./gs_data \ # 输入数据路径
-m ./output \ # 模型输出路径
--iterations 30000 \ # 推荐迭代次数
--densification_interval 100 \ # 高斯密度控制
--opacity_threshold 0.005

关键参数：

–densification_interval ：控制高斯数量增长频率
–position_lr_init 0.00016 ：位置学习率
–lambda_dssim 0.2 ：结构相似性权重

实时可视化

python viewer.py -m ./output

重建结果

通过3D Gaussian Splatting（3DGS）技术，我们实现了显著优于传统方法的细节重建效果。

局限性

3D Gaussian Splatting (3DGS) 在室内重建中虽然表现出色，但是会生成大量冗余高斯椭球，难以完全清除由此产生的椭圆状高斯分布杂质。但是在室外建筑中却表现良好。下面是使用无人机绕飞一圈重建的苏州虎丘塔。

总结

实验结果表明，基于视频帧提取的三维重建方法能够准确还原室内空间结构，房间连接处的几何连续性
保持良好。相较于直接使用摄像头拍摄重建点云的方法，视频帧提取方案通过优化采样策略，在保证重
建精度的同时显著提升了处理效率。

Astra Pro深度相机折腾记

Fri, 11 Jul 2025 00:40:02 +0800

最近在闲鱼花了45元淘了台深度相机，折腾出一套3D目标检测系统：用YOLOv8识别物体，结合深度信息生成3D边界框，Open3D可视化点云。踩了不少坑，在此记录一下。

缘起：一台从闲鱼来的相机

前段时间在闲鱼上刷到了一台 ASTRA Pro 深度相机，价格美丽，卖家说是公司项目结束后的闲置设备。想着最近在尝试点云重建的研究，于是果断入手了。收到货后发现品相不错，就开始了我的深度相机折腾之旅。

什么是深度相机？

简单来说，深度相机不仅能拍摄普通的彩色图像，还能获取场景中每个像素点的距离信息。这就像给普通相机加上了"透视眼"，能够感知到三维空间的深度信息。

ASTRA Pro 是奥比中光（Orbbec）出品的一款深度相机，主要特点：

支持 RGB 彩色图像和深度图像同时输出
基于结构光技术，在室内环境下表现不错
支持 OpenNI2 接口，开发起来相对友好

搭建开发环境

首先需要安装一堆依赖库：

# 主要依赖
pip install opencv-python
pip install numpy
pip install open3d
pip install ultralytics
pip install PyYAML

然后是 OpenNI2 的安装，这个稍微麻烦一些，需要根据系统版本下载对应的驱动。

1. 驱动安装

下载地址：Orbbec Camera Driver for Windows
官网入口：www.orbbec3d.com（需点击"更多"按钮）
安装后必须重启电脑
验证：在设备管理器中查看 Orbbec/ORBBEC Depth Sensor

2. OpenNI SDK配置

下载SDK：Orbbec OpenNI SDK
安装步骤：
1. 解压压缩包
2. 复制Win64-Release文件夹内容到 C:\Program Files\Orbbec\OpenNI
测试工具：运行 OpenNI\tools\NiViewer\NiViewer.exe

3. 环境变量设置

临时设置（当前会话有效）

$Env:OPENNI2_REDIST64="C:/Program Files/Orbbec/OpenNI/sdk/libs"

永久设置（需要重启终端）

[Environment]::SetEnvironmentVariable("OPENNI2_REDIST64", "C:/Program Files/Orbbec/OpenNI/sdk/libs", "Machine")

代码架构设计

整个程序的核心思路是：

同时获取 RGB 图像和深度图像
用 YOLOv8 检测 RGB 图像中的物体
结合深度信息生成 3D 边界框
用 Open3D 实现点云和 3D 边界框的可视化

配置文件管理

为了方便调试，我把相机参数都写在了 YAML 配置文件里：

Camera:
  width: 640
  height: 480
  fps: 30
  fx: 570.3
  fy: 570.3
  cx: 320.0
  cy: 240.0
  DepthMapFactor: 1000.0

Viewer:
  point_size: 4.0
  background_color: [0, 0, 0]

这样调参数的时候就不用重新编译了，改完配置文件重启程序就行。

核心功能实现

1. 双摄像头同步

最开始遇到的问题是 RGB 摄像头和深度摄像头的同步问题。深度相机内置的 RGB 模块质量一般，所以我另外接了一个 USB 摄像头。

# 初始化深度流
depth_stream = dev.create_depth_stream()
depth_stream.set_video_mode(...)
depth_stream.start()

# 初始化 RGB 摄像头
cap = cv2.VideoCapture(1)  # 注意索引号

2. YOLO 目标检测

用的是 YOLOv8n 模型，轻量级，在我的笔记本上跑起来还算流畅：

model = YOLO('yolov8n.pt')
results = model(rgb_frame, conf=0.4, iou=0.7)

3. 3D 边界框生成

这部分是最有趣的，通过深度信息把 2D 检测框转换成 3D 边界框：

def get_3d_bbox_from_2d(x1, y1, x2, y2, depth, fx, fy, cx, cy):
    # 获取边界框区域的深度值
    mask = depth[y1:y2, x1:x2]
    valid_depths = mask[mask > 0]
    
    # 计算深度范围
    zmin, zmax = np.percentile(valid_depths, [10, 90])
    
    # 投影到3D空间
    # ... 详细计算过程

4. 目标跟踪

为了让检测结果更稳定，加了个简单的基于 IoU 的目标跟踪：

def track_boxes(prev_boxes, new_boxes, iou_threshold=0.5):
    # 计算新旧边界框的重叠度
    # 关联最匹配的边界框
    # 对于消失的目标，逐渐降低置信度

这样就避免了检测框在连续帧间剧烈跳动的问题。

实际效果

点云与3D边界框可视化

生成的点云效果还是很不错的，同时检测到的物体会用不同颜色的 3D 边界框标出来，并且显示距离信息：

深度图可视化

深度图用热力图的形式展示，近的地方是红色，远的地方是浅黄色：

遇到的坑

1. 相机内参标定

最开始直接用了网上找的参数，结果投影出来的 3D 点云完全变形了。后来老老实实用棋盘格标定了一遍，效果好了很多。

2. 坐标系转换

OpenCV、Open3D 和深度相机的坐标系都不太一样，需要做坐标转换。我用了一个变换矩阵：

COORD_TRANSFORM = np.array([
    [1, 0, 0, 0],
    [0, -1, 0, 0],
    [0, 0, -1, 0],
    [0, 0, 0, 1]
])

3. 性能优化

最开始程序跑起来很卡，后来发现是点云密度太高了。加了体素下采样后流畅了很多：

pcl = pcl.voxel_down_sample(voxel_size=0.005)

4. 内存管理

Open3D 的几何体对象需要手动管理，不然会内存泄漏。特别是在循环中创建 LineSet 的时候，需要复用对象而不是每次都创建新的。

交互控制

加了一些简单的键盘控制：

+ 和 - 键控制缩放
q 键退出程序

key = cv2.waitKey(1)
if key == ord('+'):
    zoom_level = max(0.1, zoom_level * 0.8)
    vis.get_view_control().set_zoom(zoom_level)
elif key == ord('q'):
    break

后续计划

这个小项目还有很多可以改进的地方：

SLAM 功能：加入视觉里程计，实现实时建图
手势识别：利用深度信息识别手势
物体抓取：结合机械臂做物体抓取
AR 应用：在现实场景中叠加虚拟物体

P.S. 点云的世界远比我想象的要精彩，这只是个开始… 🚀

Gridea Pro上手指南

Mon, 09 Jun 2025 21:42:45 +0800

这篇指南会带你一步一步熟悉 Gridea Pro 的界面和操作，从写第一篇文章到把博客发布到互联网上。

认识界面

打开 Gridea Pro 后，你会看到左侧是导航栏，右侧是内容区域。

左侧导航从上到下依次是：

导航	说明
文章	管理你的所有博客文章
闪念	速记短想法、灵感碎片
评论	查看和管理读者评论
菜单	自定义博客的导航栏
分类	管理文章分类
标签	管理文章标签
友链	管理友情链接
主题	切换和配置博客主题
配置	设置部署平台、SEO、CDN 等

导航栏底部有两个重要按钮：「预览」 和 「同步」，后面会详细讲到。

第一步：写一篇文章

点击左侧导航的 「文章」，进入文章列表页
点击右上角的 「 + 」 按钮，会打开文章编辑器
最上方是标题输入框，下方是正文编辑区（支持 Markdown 语法）
写完内容后，点击编辑器顶部工具栏的 「纸飞机」图标，会弹出右侧的文章设置面板

在文章设置面板中，你可以配置：

URL — 文章的访问路径（默认用文件名）
分类 — 选择一个分类，或留空
标签 — 点击已有标签添加，或输入新标签后按回车创建
创建时间 — 默认是当前时间，可以手动修改
封面图 — 粘贴在线图片链接，或点击上传区域从本地选择图片
列表中隐藏 — 开启后，文章不会出现在博客的文章列表中（适合「关于」页面）
置顶文章 — 开启后，文章会固定在博客文章列表最顶部

设置完成后，点击面板底部的 「发布」 按钮保存文章。如果还没写完，可以点击编辑器顶部的 「存草稿」 先保存为草稿。

小技巧：在文章正文中插入标记，标记之前的内容会作为文章摘要显示在博客文章列表中。你可以通过编辑器右侧的 「…」 按钮快速插入这个标记。

第二步：选择主题

点击左侧导航的 「主题」
默认进入 「选择主题」 标签页，上方展示当前使用的主题，下方展示其他可选主题
找到你喜欢的主题，点击 「使用该主题」 按钮即可切换
切换到 「基础配置」 标签页，填写你的博客名称、作者名、站点描述等信息
切换到 「个性化」 标签页，可以调整配色风格

每个主题还可能有自己的 「自定义配置」（第四个标签页），比如社交链接、布局选项等，按需设置即可。

第三步：预览效果

点击左侧导航栏底部的 「预览」 按钮。

Gridea Pro 会自动渲染整个博客站点，然后在浏览器中打开预览。你可以看到文章、主题、导航菜单等所有内容的实际效果。

每次修改文章或切换主题后，重新点击「预览」即可看到最新效果。

注意：博客不是实时渲染的，每次修改后都需要再次点击「预览」才能看到最新效果。

第四步：配置部署

现在你的博客在本地已经准备好了，接下来配置一个部署平台，把它发布到互联网上。

点击左侧导航的 「配置」
在 「平台」 下拉框中选择你要使用的部署方式

Gridea Pro 支持以下平台：

GitHub Pages（推荐新手）

免费、全球可访问，配置步骤：

在 GitHub 上创建一个新仓库（仓库名建议用 你的用户名.github.io）
前往 GitHub → Settings → Developer settings → Personal access tokens，生成一个 Token（勾选 repo 权限）
回到 Gridea Pro 的「配置」页面，填写以下信息：
- 域名：https://你的用户名.github.io
- 仓库名称：你的用户名.github.io
- 分支：main
- 仓库用户名：你的 GitHub 用户名
- 邮箱：你的 GitHub 注册邮箱
- 令牌：刚才生成的 Token
- CNAME：你购买的个人域名，如：are.ink`（可选，没有可不填）
点击 「检测远程连接」 验证配置是否正确
点击 「保存」

Vercel

支持自定义域名，智能增量部署。需要填写 Vercel 项目名称和 Access Token。

自定义域名（CNAME）

如果你有自己的域名（比如 blog.example.com），可以在上面的配置中填写 CNAME 字段：

在你的域名注册商的 DNS 管理后台，添加一条 CNAME 记录，将你的域名指向 你的用户名.github.io（GitHub）或对应的平台域名
回到 Gridea Pro 的「配置」页面，将 域名 改为你的自定义域名（如 https://blog.example.com）
在 CNAME 字段中填写你的自定义域名（如 blog.example.com）
保存后重新同步，GitHub Pages 会自动识别并生效

如果暂时没有自己的域名，跳过这步即可，后面随时可以配置。

第五步：发布上线

一切配置就绪后，点击左侧导航栏底部的 「同步」 按钮（火箭图标）。

按钮会显示加载动画，等待片刻，看到「同步成功啦！」的提示，说明你的博客已经发布到互联网上了。

打开浏览器，访问你在第四步填写的域名，就能看到你的博客了。

以后每次写完新文章或修改配置后，点一下「同步」就会自动更新。

提示：Gridea Pro 内置了完整的 Git 引擎，你的电脑上不需要安装 Git，填好配置信息后，点击 「同步」 按钮就可以直接发布。

公式测试

测试 $E=mc^2$
测试 $f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

彩蛋：MCP — 用 AI 管理你的博客

这是 Gridea Pro 最独特的能力之一。

Gridea Pro 内置了完整的 MCP（Model Context Protocol）服务，这意味着你可以通过 AI 助手（如 Claude Desktop、Cursor、Claude Code 等任何支持 MCP 协议的客户端）直接与你的博客对话。

你可以用自然语言让 AI 帮你：

写文章 — 「帮我写一篇关于 Vue 3 组合式 API 的入门教程，标签设为 Vue 和前端」
管理内容 — 「把所有标签为"草稿"的文章列出来」「删除上个月的测试文章」
记录闪念 — 「记一条速记：今天学会了 Go 的 channel 用法」
配置站点 — 「把博客名称改为"我的技术笔记"」「切换到 flavor-theme 主题」
发布部署 — 「渲染站点并部署到 GitHub Pages」

总共提供了 30+ 个 AI 可调用的工具，覆盖文章、标签、分类、菜单、友链、闪念、主题、设置、渲染和部署的完整管理。你不需要打开 Gridea Pro 的界面，在 AI 对话框里就能完成几乎所有操作。

配置方式很简单。以 Claude Desktop 为例，打开 Settings → Developer → Edit Config，在 mcpServers 中添加：

macOS：

{
  "mcpServers": {
    "gridea-pro": {
      "command": "/Applications/Gridea Pro.app/Contents/MacOS/gridea-pro-mcp",
      "env": {
        "SOURCE_DIR": "/Users/你的用户名/Documents/Gridea Pro"
      }
    }
  }
}

Windows：

{
  "mcpServers": {
    "gridea-pro": {
      "command": "C:\\Program Files\\Gridea Pro\\gridea-pro-mcp.exe",
      "env": {
        "SOURCE_DIR": "C:\\Users\\你的用户名\\Documents\\Gridea Pro"
      }
    }
  }
}

command — gridea-pro-mcp 二进制文件的路径
SOURCE_DIR — 你的博客数据目录路径
DEPLOY_ENABLED — 是否允许 AI 执行发布/同步操作（可选，默认关闭）

默认情况下，AI 只能帮你写文章、管理内容，不能直接发布到线上。如果你信任 AI 并希望它能帮你一键发布，在 env 中加上 "DEPLOY_ENABLED": "true" 即可开启。建议熟悉流程后再开启。

保存后重启 Claude Desktop，就可以在对话中直接管理你的博客了。它通过本地管道通信，不需要网络、不需要端口、不需要认证，数据始终在你的电脑上。

看到这里，相信你对 Gridea Pro 的使用已经有了全面的了解。

这篇指南可以随时删除。祝你博客之旅愉快！

Intelligent Construction Site Safety Report Generator

Thu, 14 Nov 2024 05:19:50 +0800

Intelligent construction site safety report generator using YOLO and GPT-4 for automated image analysis and report generation.

In the modern construction industry, safety is always the top priority. To enhance the efficiency and accuracy of construction site safety management, we have developed an intelligent construction site safety report generator. This tool combines computer vision and artificial intelligence technologies to automatically analyze construction site images, generate detailed safety reports, and provide improvement suggestions. This article will detail the development process, features, and usage of this tool.

You can see the effect in the following video：

@bilibili

You can view the generated security report at the following link：
Report

Project Background

Construction site safety management involves a large amount of image data, and traditional analysis methods are time-consuming and prone to errors. To address this issue, we decided to develop an automated tool that can quickly and accurately analyze construction site images and generate detailed safety reports. The core technologies of this tool include the YOLO (You Only Look Once) object detection model and OpenAI’s GPT-4 model.

Technical Architecture

YOLO Object Detection Model: We use the YOLO model to identify and classify various objects in construction site images, such as excavators, safety helmets, gloves, etc. The YOLO model is efficient and accurate, capable of processing large amounts of image data in real-time.
OpenAI GPT-4 Model: When generating safety reports, we use OpenAI’s GPT-4 model to summarize analysis results and provide improvement suggestions. The GPT-4 model can understand natural language and generate high-quality text content.
Tkinter Graphical User Interface: To facilitate user operation, we developed a graphical user interface (GUI) using the Tkinter library. Users can select image folders, set output paths, and start the processing process through this interface.

Features

Automatic Image Processing: Users only need to select the image folder, and the tool will automatically process all images, identify objects, and generate reports.
Detailed Report Generation: The tool generates detailed reports, including analysis results for each image, object classification statistics, and overall safety assessments.
AI Summary and Suggestions: Using the GPT-4 model to generate summaries and improvement suggestions, helping users better understand analysis results and take corresponding measures.
Time Recording: During the processing, the tool records the time of each operation, making it convenient for users to understand the processing progress.
Clear Function: Users can clear the AI report generation box at any time to restart the analysis.

Usage

Select Image Folder: Click the “Browse” button to select the folder containing construction site images.
Set Output Paths: Select the save paths for the identified images and the Markdown file.
Start Processing: Click the “Start Processing” button, and the tool will automatically process the images and generate reports.
View Reports: After processing, users can view detailed analysis results and improvement suggestions in the AI report generation box.
Clear Reports: If you need to restart the analysis, you can click the “Clear” button to clear the AI report generation box.

Project Summary

This intelligent construction site safety report generator not only improves the efficiency of construction site safety management but also significantly reduces the error rate of manual analysis. By combining advanced computer vision and artificial intelligence technologies, we have successfully developed a practical and efficient tool that provides strong support for safety management in the construction industry. In the future, we will continue to optimize and expand the functionality of this tool to meet the needs of more users.

Future Prospects

Multilingual Support: Add support for multiple languages to make the tool accessible to global users.
Real-Time Monitoring: Develop real-time monitoring functionality that can analyze images on-site and provide instant feedback.
Data Visualization: Enhance data visualization features to allow users to understand analysis results more intuitively.

Through continuous technological innovation and functional expansion, we believe that this intelligent construction site safety report generator will play an increasingly important role in the future construction industry.

Simple Stack - A Foolproof Stacking Software for MAC

Fri, 20 Sep 2024 00:27:12 +0800

This is a stacking software developed by Ke Lejun in his spare time, specifically for use on MAC.

Introduction

In the realm of photography, especially in astrophotography and long-exposure scenarios, image stacking is a technique that combines multiple images to create a single, high-quality image. This process helps in reducing noise and enhancing details. While there are several professional tools available for image stacking, many photographers, especially beginners, seek a simpler, more user-friendly solution.

Enter Simple Stack – a foolproof stacking software designed specifically for MAC users. Simple Stack aims to simplify the image stacking process, making it accessible to both novice and experienced photographers alike. With an intuitive interface and straightforward functionality, Simple Stack eliminates the complexities often associated with image processing software, allowing users to focus on capturing stunning images rather than wrestling with technical details.

Interface Introduction

Support for Dark Mode.
Bilingual Support.
Simplicity and Ease of Use.

How It Works

In the fields of photography and computer vision, image stacking is a powerful technique that combines multiple images into a single high-quality image. This method is commonly used in astrophotography, macro photography, and scenarios requiring long exposures to reduce noise and enhance details. This Section will delve into a Python and OpenCV-based image stacking and enhancement program, explaining its workings in detail.

Program Overview

The main function of this program is to align, stack, and enhance multiple images from a specified folder, ultimately generating a high-quality image. The program flow is as follows:

Load Images: Load all images from the specified folder.
Image Alignment: Align images using ORB feature detection and the RANSAC algorithm.
Image Stacking: Stack the aligned images using weighted average stacking.
Image Enhancement: Enhance the stacked image using wavelet transform denoising, unsharp masking, and CLAHE for contrast enhancement.
Save and Display Results: Save and display the final stacked image.

Code Explanation

1. Load Images

def load_images_from_folder(folder):
    images = []
    for filename in os.listdir(folder):
        img = cv2.imread(os.path.join(folder, filename))
        if img is not None:
            images.append(img)
    return images

The load_images_from_folder function is responsible for loading all images from the specified folder. It iterates through each file in the folder, reads the image using cv2.imread, and appends successfully read images to the images list.

2. Image Alignment

def align_images(base_image, image_to_align):
    orb = cv2.ORB_create()
    keypoints1, descriptors1 = orb.detectAndCompute(base_image, None)
    keypoints2, descriptors2 = orb.detectAndCompute(image_to_align, None)

    bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
    matches = bf.match(descriptors1, descriptors2)
    matches = sorted(matches, key=lambda x: x.distance)
    good_matches = matches[:int(len(matches) * 0.15)]

    if len(good_matches) < 4:
        return None, 0

    src_pts = np.float32([keypoints1[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
    dst_pts = np.float32([keypoints2[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)

    M, mask = cv2.findHomography(dst_pts, src_pts, cv2.RANSAC, 5.0)
    alignment_quality = np.sum(mask) / len(mask)

    aligned_image = cv2.warpPerspective(image_to_align, M, (base_image.shape[1], base_image.shape[0]))

    return aligned_image, alignment_quality

The align_images function aligns two images. It first uses the ORB feature detector to detect keypoints and descriptors of both images, then uses BFMatcher for feature matching. To improve alignment accuracy, the program retains only the top 15% of matches. Subsequently, the RANSAC algorithm is used to estimate the homography matrix, and the image is transformed using cv2.warpPerspective to achieve alignment.

3. Image Stacking

def stack_images_weighted_average(images):
    stacked_image = images[0].astype(np.float32)
    weights = np.ones_like(stacked_image)

    for image in images[1:]:
        stacked_image += image.astype(np.float32)
        weights += np.ones_like(image)

    stacked_image /= weights
    stacked_image = np.clip(stacked_image, 0, 255).astype(np.uint8)

    return stacked_image

The stack_images_weighted_average function stacks the aligned images using weighted average stacking. It initializes the stacked image with the first image, then iterates through the remaining images, accumulating them into the stacked image and calculating their weights. Finally, it computes the weighted average and converts the result back to uint8 format.

4. Image Enhancement

def enhance_image(image):
    if image.dtype != np.uint8:
        image = np.clip(image, 0, 255).astype(np.uint8)

    coeffs = pywt.dwt2(image, 'db1')
    cA, (cH, cV, cD) = coeffs
    cA = pywt.threshold(cA, np.std(cA), mode='soft')
    cH = pywt.threshold(cH, np.std(cH), mode='soft')
    cV = pywt.threshold(cV, np.std(cV), mode='soft')
    cD = pywt.threshold(cD, np.std(cD), mode='soft')
    denoised_image = pywt.idwt2((cA, (cH, cV, cD)), 'db1')

    if denoised_image.dtype != np.uint8:
        denoised_image = np.clip(denoised_image, 0, 255).astype(np.uint8)

    blurred = cv2.GaussianBlur(denoised_image, (0, 0), 3)
    unsharp_mask = cv2.addWeighted(denoised_image, 1.5, blurred, -0.5, 0)

    if unsharp_mask.dtype != np.uint8:
        unsharp_mask = np.clip(unsharp_mask, 0, 255).astype(np.uint8)

    lab = cv2.cvtColor(unsharp_mask, cv2.COLOR_BGR2LAB)
    l, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
    l = clahe.apply(l)
    lab = cv2.merge((l, a, b))
    enhanced_image = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)

    return enhanced_image

The enhance_image function enhances the stacked image. It first denoises the image using wavelet transform, then enhances edges using unsharp masking, and finally enhances contrast using CLAHE (Contrast Limited Adaptive Histogram Equalization).

5. Main Function

def main():
    folder = "/Users/img"
    images = load_images_from_folder(folder)

    if len(images) == 0:
        print("No images found in the folder.")
        return

    base_image = images[0]
    aligned_images = [base_image]

    for image in tqdm(images[1:], desc="Aligning images"):
        aligned_image, alignment_quality = align_images(base_image, image)
        if alignment_quality > 0.5:
            aligned_images.append(aligned_image)

    stacked_image = stack_images_weighted_average(aligned_images)
    enhanced_image = enhance_image(stacked_image)

    cv2.imwrite("stacked_image.jpg", enhanced_image)
    cv2.imshow("Stacked Image", enhanced_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

The main function is the entry point of the program. It first loads the images, then selects the first image as the base image and aligns the remaining images. Next, it stacks and enhances the aligned images, and finally saves and displays the result.

Effect Demonstration

We tested many photos and achieved good results.

Processing of Starry Skies

Moon Processing

Conclusion

Simple Stack represents a significant step forward in making image stacking accessible to a broader audience. By focusing on simplicity, user-friendliness, and powerful functionality, Simple Stack empowers photographers to create high-quality images with ease. Whether you are a beginner or an experienced photographer, Simple Stack offers a seamless and efficient solution for your image stacking needs.

We invite you to try Simple Stack and experience the difference it can make in your photography workflow. Download it today and start creating stunning images with minimal effort!

Download

You can download it from the following link, and it can be used directly after extraction.
https://huggingface.co/datasets/ColamanAI/3DPointCloud/resolve/main/SimpleStack.zip

Usage of the Hong Kong University Architectural Dataset

Tue, 17 Sep 2024 00:18:19 +0800

The Hong Kong University Architectural Dataset is a project of my advisor, used for training models for architectural component segmentation.

Basic Information

In image-driven 3D building reconstruction, instance segmentation is fundamental to pixel-wise building component detection, which can be fused with 3D data like point clouds and meshes via camera projection for semantic reconstruction. While deep learning-based segmentation has obtained promising results, it relies heavily on large-scale datasets for training. Unfortunately, existing large-scale image datasets often include irrelevant objects that obstruct building components, making them unsuitable for 3D building reconstruction. This paper addresses this gap by introducing a large-scale building image dataset to facilitate building component segmentation for 3D reconstruction. The dataset comprises 3378 images captured from both interiors and exteriors of 36 university buildings, annotated with 49,380 object instances across 11 classes. Rigorous quality control measures were employed during data collection and annotation. Evaluation of five typical deep learning-based instance segmentation models demonstrates the dataset’s suitability for training and its value as a benchmark dataset for building component segmentation.

Below is the link to the paper：
Mun On Wong, Huaquan Ying, Mengtian Yin, Xiaoyue Yi, Lizhao Xiao, Weilun Duan, Chenchen He, Llewellyn Tang,
Semantic 3D reconstruction-oriented image dataset for building component segmentation,Automation in Construction,Volume 165,2024,105558,ISSN 0926-5805,
https://doi.org/10.1016/j.autcon.2024.105558.
https://www.sciencedirect.com/science/article/pii/S0926580524002942

My Work

I need to convert this dataset into the YOLOV8 format and then train it for architectural component segmentation. This segmentation can be used in point cloud projection to project individual components, such as walls, windows, or ceilings, where I only need specific elements like walls, windows, or ceilings.

Download Hong Kong University Architectural Dataset in YOLOV8 Format

Now, I will share how to train the dataset into a YOLOV8-supported model file.

Training Process

I used the free computational power provided by Kaggle for training, and I trained for a total of 330 epochs.

Install YOLOv8

INPUT:

%pip install ultralytics
import ultralytics
ultralytics.checks()

OUTPUT：

Ultralytics YOLOv8.2.82 🚀 Python-3.10.13 torch-2.1.2 CUDA:0 (Tesla T4, 15095MiB)
Setup complete ✅ (4 CPUs, 31.4 GB RAM, 5845.9/8062.4 GB disk)

Test YOLOv8

Now we need to test whether YOLOv8 is installed successfully. We will test it with an official image.
INPUT:

# Run inference on an image with YOLOv8n
!yolo predict model=yolov8n.pt source='https://ultralytics.com/images/zidane.jpg'

If the installation is successful, you will find the following image in the runs/detect/predict folder. At this point, you can start using YOLOv8.

Begin Training

You can start training the dataset using the following program:

!yolo segment train data=/kaggle/input/buliding/data.yaml model=yolov8n-seg.pt epochs=330 imgsz=640 device=[0,1] save_period=50

The content of data.yaml is as follows:

train: /kaggle/input/buliding/train/images val: /kaggle/input/buliding/valid/images test: /kaggle/input/buliding/test/images nc: 12 names: ['Beam', 'Ceiling', 'Column', 'CurtainWall', 'Door', 'Floor', 'Lift', 'Opening', 'Roof', 'Wall', 'Window', 'object']

If the training is successful, you will be able to see the training curves in the results.

Below are some details about the scope and labels of architectural components:

Test

Let’s write a piece of code to test the segmentation effect:

from ultralytics import YOLO
import cv2
import numpy as np

# Define a set of conspicuous colors
colors = [
    (0, 255, 0),    # Green
    (0, 0, 255),    # Red
    (255, 0, 0),    # Blue
    (255, 255, 0),  # Yellow
    (255, 0, 255),  # Magenta
    (0, 255, 255),  # Cyan
    (128, 0, 128),  # Purple
    (255, 165, 0),  # Orange
    (0, 128, 128),  # Teal
    (128, 128, 0)   # Olive
]

model = YOLO('/runs/best.pt')  # Use instance segmentation model

# Read the image
image_path = '/Users/0081.png'
image = cv2.imread(image_path)

# Perform prediction
results = model(image)

# Process results
for i, result in enumerate(results):
    masks = result.masks.data  # Masks
    boxes = result.boxes.xyxy  # Bounding boxes
    classes = result.boxes.cls  # Classes
    scores = result.boxes.conf  # Confidence scores

    # Output results
    for j, (box, mask, cls, score) in enumerate(zip(boxes, masks, classes, scores)):
        x1, y1, x2, y2 = map(int, box)
        class_id = int(cls)
        confidence = float(score)

        # Output bounding box and class information
        print(f"Bounding Box: ({x1}, {y1}), ({x2}, {y2})")
        print(f"Class: {class_id}, Confidence: {confidence:.2f}")

        # Convert mask to 8-bit image
        mask = mask.cpu().numpy()
        mask = (mask * 255).astype(np.uint8)

        # Resize mask to match the image shape
        mask = cv2.resize(mask, (image.shape[1], image.shape[0]))

        # Select a conspicuous color
        color = colors[j % len(colors)]

        # Convert mask to colored mask
        colored_mask = np.zeros_like(image)
        colored_mask[mask > 0] = color

        # Overlay colored mask onto the original image
        image = cv2.addWeighted(image, 1, colored_mask, 0.5, 0)

        # Draw bounding box and class label on the image
        cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)
        cv2.putText(image, f'Class {class_id}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, 2)

# Display the result image
cv2.imshow('Segmented Image', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The output of this code:

Great,👌 we have obtained a very good result.

Summary

By training the dataset, we obtained a fairly good result for architectural component segmentation, which plays a significant role in subsequent 3D reconstruction and point cloud modeling. Accurate segmentation of architectural components also makes subsequent work much smoother.

Enjoy your usage！