NG: 1.x的版本 (N=NEW)OG:0.9.x的版本,不用管(O=OLD)由于我使用的是CDH5.7.0,故选择flume-ng-1.6.0-cdh5.7.0版本,注意此1.6和社区的1.6有差别。
多sinkAgent(也很常见):
#修改配置文件,添加JAVA_HOME
[hadoop@hadoop001 app]$ cd ~/app/apache-flume-1.6.0-cdh5.7.0-bin [hadoop@hadoop001 apache-flume-1.6.0-cdh5.7.0-bin]$ cp ~/app/apache-flume-1.6.0-cdh5.7.0-bin/conf/flume-env.sh.template ~/app/apache-flume-1.6.0-cdh5.7.0-bin/conf/flume-env.sh [hadoop@hadoop001 apache-flume-1.6.0-cdh5.7.0-bin]$ vim ~/app/apache-flume-1.6.0-cdh5.7.0-bin/conf/flume-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_45#添加环境变量
hadoop@hadoop001 bin]$ soruce ~/.bash_profile export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin export PATH=$FLUME_HOME/bin:$PATH [hadoop@hadoop001 bin]$ source ~/.bash_profile [hadoop@hadoop001 bin]$ which flume-ng ~/app/apache-flume-1.6.0-cdh5.7.0-bin/bin/flume-ng
memory channel:capatity=>channel的存储最大event(消息)个数,生产至少10万条,transationCapacity=>最多达到多少条必须提交事务。生产也必须调大。 logger:就是控制台类型的sink注意1:一个source可以绑定多个channel,但是一个sink只能绑定一个Channel
生产的架构是: log数据=》flume=》hdfs,这里我们采用简单的Exec Source通过tail -F 数据文件进行数据采集。
# example.conf: A single-node Flume configuration # Name the components on this agent exec-hdfs-agent.sources = exec-source exec-hdfs-agent.sinks = hdfs-sink exec-hdfs-agent.channels = memory-channel # Describe/configure the source exec-hdfs-agent.sources.exec-source.type = exec exec-hdfs-agent.sources.exec-source.command = tail -F /home/hadoop/data/access_10000.log exec-hdfs-agent.sources.exec-source.shell = /bin/sh -c # Describe the sink exec-hdfs-agent.sinks.hdfs-sink.type = hdfs exec-hdfs-agent.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/exec exec-hdfs-agent.sinks.hdfs-sink.hdfs.fileType = DataStream exec-hdfs-agent.sinks.hdfs-sink.hdfs.writeFormat = Text # Use a channel which buffers events in memory exec-hdfs-agent.channels.memory-channel.type = memory exec-hdfs-agent.channels.memory-channel.capacity = 1000 exec-hdfs-agent.channels.memory-channel.transactionCapacity = 100 # Bind the source and sink to the channel exec-hdfs-agent.sources.exec-source.channels = memory-channel exec-hdfs-agent.sinks.hdfs-sink.channel = memory-channel
写hdfs文件时先生成创建一个后缀名称为.tmp的文件,当写完成时,去掉了.tmp
缺点: 虽然此种tail方式可以将日志数据采集到hdfs,但是tail -F进程挂了咋办,还是会丢数据!生产上是行不通的。无法做到高可用。其次上面的采集流程并未解决生成大量小文件的问题,无法做到高可靠tail只能监控一个文件,生产中更多的是监控一个文件夹。不能满足需求
上述的Spooling Directory Source配置虽然解决了小文件过多以及监控多个文件的问题,但是依旧有如下问题。
问题1:虽然能监控一个文件夹,但是无法监控递归的文件夹中的数据问题2:若采集时Flume挂了,无法保证重启时还从之前文件读取的那一行继续采集数据基于以上两个问题,此凡是生产也是不可接受的
# example.conf: A single-node Flume configuration # Name the components on this agent taildir-hdfs-agent.sources = taildir-source taildir-hdfs-agent.sinks = hdfs-sink taildir-hdfs-agent.channels = memory-channel # Describe/configure the source taildir-hdfs-agent.sources.taildir-source.type = TAILDIR taildir-hdfs-agent.sources.taildir-source.filegroups = f1 taildir-hdfs-agent.sources.taildir-source.filegroups.f1 = /home/hadoop/data/flume/taildir/input/.* taildir-hdfs-agent.sources.taildir-source.positionFile = /home/hadoop/data/flume/taildir/taildir_position/taildir_position.json # Describe the sink taildir-hdfs-agent.sinks.hdfs-sink.type = hdfs taildir-hdfs-agent.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/taildir/%Y%m%d%H%M taildir-hdfs-agent.sinks.hdfs-sink.hdfs.useLocalTimeStamp = true taildir-hdfs-agent.sinks.hdfs-sink.hdfs.fileType = CompressedStream taildir-hdfs-agent.sinks.hdfs-sink.hdfs.writeFormat = Text taildir-hdfs-agent.sinks.hdfs-sink.hdfs.codeC = gzip taildir-hdfs-agent.sinks.hdfs-sink.hdfs.filePrefix = wsk taildir-hdfs-agent.sinks.hdfs-sink.hdfs.rollInterval = 30 taildir-hdfs-agent.sinks.hdfs-sink.hdfs.rollSize = 100000000 taildir-hdfs-agent.sinks.hdfs-sink.hdfs.rollCount = 0 # Use a channel which buffers events in memory taildir-hdfs-agent.channels.memory-channel.type = memory taildir-hdfs-agent.channels.memory-channel.capacity = 1000 taildir-hdfs-agent.channels.memory-channel.transactionCapacity = 100 # Bind the source and sink to the channel taildir-hdfs-agent.sources.taildir-source.channels = memory-channel taildir-hdfs-agent.sinks.hdfs-sink.channel = memory-channel
转载于:https://www.cnblogs.com/xuziyu/p/11004103.html
相关资源:完整的flume教程