flume的TaildirSource介绍及升级改造-白红宇

flume的TaildirSource介绍及升级改造

阅读量：634 次

发布时间：2019-03-14

本文共 1374 字，大约阅读时间需要 4 分钟。

Flume 1.7.0 中引入了 taildirSource 组件，能够监控指定目录下的文件，根据正则表达式筛选文件，并支持断点续传。然而，用户发现 CDH 版本的 Flume 1.6.0 也包含该组件，而 Apache 版本的 1.7.0 则缺少 TaildirMatcher.java 以及部分代码差异，显示出 Apache 版本的实现可能更为完善。

用户选择使用 CDH 版本的 Flume 1.6.0-cdh5.5.2-bin，因其现有环境中已有该版本，没有重装 Apache 1.7.0。为了满足需求，用户配置了 Flume taildir源，并测试了其工作情况。

配置文件如下：

a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type = taildira1.sources.r1.channels = c1a1.sources.r1.positionFile = /home/hadoop/hui/taildir_position.jsona1 sourced.r1.filegroups = f1 f2a1.sources.r1.filegroups.f1 = /home/hadoop/hui/test1/.*  #匹配除换行符 \n 之外的任何单字符。*匹配前面的子表达式零次或多次a1.sources.r1.filegroups.f2 = /home/hadoop/hui/test2/.*fileHeader = truefileHeaderKey = filea1.sinks.k1.type = file_rolla1.sinks.k1.sink.directory = /home/hadoop/huia1.sinks.k1.sink.rollInterval = 0 a1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 1000

测试文件结构：

.├── messages.1├── qiang├── hui.txt├── test1│   ├── hehe.txt│   └── messages.2└── test2    ├── messages.3    ├── messages.4    └── test1 → test2/test1        ├── hehe.txt        └── messages.2

运行 Flume：

bin/flume-ng agent -c . -f conf/taildir.conf -n a1 -Dflume.root.logger=INFO,console

启动后生成文件：

1489881718232-1 → hello world hehehello world 3hello world 4

进一步测试文件改名和新内容推送：

mv test2/test1/hehe.txt test2/haha.txtecho "hello china" >> test2/test1/hehe.txt

测试结果显示 Flune 正确处理文件改名和新内容。

转载地址：http://xynoz.baihongyu.com/

你可能感兴趣的文章

Objective-C实现打印从 0 到 n 的卡特兰数算法(附完整源码)