【迷宫中的算法实践】迷宫生成算法——递归分割算法

发表于 2016-11-22 更新于 2025-12-31 分类于算法笔记阅读次数： Valine：
本文字数： 3.3k 阅读时长 ≈ 3 分钟

Recursive division method

Mazes can be created with recursive division, an algorithm which works as follows: Begin with the maze’s space with no walls. Call this a chamber. Divide the chamber with a randomly positioned wall (or multiple walls) where each wall contains a randomly positioned passage opening within it. Then recursively repeat the process on the subchambers until all chambers are minimum sized. This method results in mazes with long straight walls crossing their space, making it easier to see which areas to avoid.

For example, in a rectangular maze, build at random points two walls that are perpendicular to each other. These two walls divide the large chamber into four smaller chambers separated by four walls. Choose three of the four walls at random, and open a one cell-wide hole at a random point in each of the three. Continue in this manner recursively, until every chamber has a width of one cell in either of the two directions.

阅读全文 »

【新手学Java】使用beanUtils控制javabean

发表于 2016-10-03 更新于 2025-12-31 分类于学习笔记阅读次数： Valine：
本文字数： 2.4k 阅读时长 ≈ 2 分钟

使用beanUtils控制javabean

阅读全文 »

【新手学Java】使用内省(Introspector)操作JavaBean属性

发表于 2016-10-03 更新于 2025-12-31 分类于学习笔记阅读次数： Valine：
本文字数： 786 阅读时长 ≈ 1 分钟

使用内省(Introspector)操作

阅读全文 »

【新手学Java】反射学习笔记

发表于 2016-10-03 更新于 2025-12-31 分类于学习笔记阅读次数： Valine：
本文字数： 4.6k 阅读时长 ≈ 4 分钟

示例类代码

阅读全文 »

缺少google api密钥,因此chromium的部分功能将无法使用”的解决办法

发表于 2016-10-01 更新于 2025-12-31 分类于踩坑笔记阅读次数： Valine：
本文字数： 351 阅读时长 ≈ 1 分钟

使用Chromium时会遇到 “缺少google api密钥,因此chromium的部分功能将无法使用”提示，google了一下 setx Google_API_KEY 和 chromium portable google api keys are missing 找到了解决办法。

阅读全文 »

【爬虫学习笔记】基于Bloom Filter的url去重模块UrlSeen

发表于 2016-09-26 更新于 2025-12-31 分类于学习笔记阅读次数： Valine：
本文字数： 1.2k 阅读时长 ≈ 1 分钟

Url Seen用来做url去重。对于一个大的爬虫系统，它可能已经有百亿或者千亿的url，新来一个url如何能快速的判断url是否已经出现过非常关键。因为大的爬虫系统可能一秒钟就会下载几千个网页，一个网页一般能够抽取出几十个url，而每个url都需要执行去重操作，可想每秒需要执行大量的去重操作。因此Url Seen是整个爬虫系统中非常有技术含量的一个部分。

阅读全文 »

【爬虫学习笔记】Url过滤模块UrlFilter

发表于 2016-09-26 更新于 2025-12-31 分类于学习笔记阅读次数： Valine：
本文字数： 920 阅读时长 ≈ 1 分钟

Url Filter则是对提取出来的URL再进行一次筛选。不同的应用筛选的标准是不一样的，比如对于baidu/google的搜索，一般不进行筛选，但是对于垂直搜索或者定向抓取的应用，那么它可能只需要满足某个条件的url，比如不需要图片的url，比如只需要某个特定网站的url等等。Url Filter是一个和应用密切相关的模块。

using System;
using System.Collections.Generic;
using Crawler.Common;

namespace Crawler.Processing
{
    public class UrlFilter
    {
        public static List<Uri> RemoveByRegex(List<Uri> uris, params string[] regexs)
        {
            var uriList=new List<Uri>(uris);
            for (var i = 0; i < uriList.Count; i++)
            {
                foreach (var r in regexs)
                {
                    if (!RegexHelper.IsMatch(uriList[i].ToString(), r)) continue;
                    uris.RemoveAt(i);
                    i--;
                }
            }
            return uriList;
        }

        public static List<Uri> SelectByRegex(List<Uri> uris, params string[] regexs)
        {
            var uriList = new List<Uri>();
            foreach (var t in uris)
                foreach (var r in regexs)
                    if (RegexHelper.IsMatch(t.ToString(), r))
                        if(!uriList.Contains(t))
                            uriList.Add(t);
            return uriList;
        }

    }
}