728x90
반응형
Gobblin
Download
Gobblin release download : Release Page
압축풀기:
tar -zxvf gobblin-distribution-…tar.gz cd gobblin-dist
환경설정
Gobblin job config directory:
job 설정파일 저장할 폴더 환경변수: GOBBLIN_JOB_CONFIG_DIR 환경변수 JAVA_HOME 제대로 되어있는지 확인
Gobblin working directory:
Gobblin의 job 출력, locks, state-store와 같은 정보 저장 환경변수: GOBBLIN_WORK_DIR
export GOBBLIN_JOB_CONFIG_DIR=/var/javaApps/gobblin-dist/job-conf export GOBBLIN_WORK_DIR=/var/javaApps/gobblin-dist/work export JAVA_HOME=
Job 설정파일
vi $GOBBLIN_JOB_CONFIG_DIR\wikipedia.pull
job.name=PullFromWikipedia job.group=Wikipedia job.description=A getting started example for Gobblin source.class=gobblin.example.wikipedia.WikipediaSource source.page.titles=LinkedIn,Wikipedia:Sandbox source.revisions.cnt=5 wikipedia.api.rooturl=https://en.wikipedia.org/w/api.php wikipedia.avro.schema={"namespace": "example.wikipedia.avro","type": "record","name": "WikipediaArticle","fields": [{"name": "revid", "type": ["double", "null"]},{"name": "pageid", "type": ["double", "null"]},{"name": "title", "type": ["string", "null"]},{"name": "user", "type": ["string", "null"]},{"name": "anon", "type": ["string", "null"]},{"name": "userid", "type": ["double", "null"]},{"name": "timestamp", "type": ["string", "null"]},{"name": "size", "type": ["double", "null"]},{"name": "contentformat", "type": ["string", "null"]},{"name": "contentmodel", "type": ["string", "null"]},{"name": "content", "type": ["string", "null"]}]} gobblin.wikipediaSource.maxRevisionsPerPage=10 converter.classes=gobblin.example.wikipedia.WikipediaConverter extract.namespace=gobblin.example.wikipedia writer.destination.type=HDFS writer.output.format=AVRO writer.partitioner.class=gobblin.example.wikipedia.WikipediaPartitioner data.publisher.type=gobblin.publisher.BaseDataPublisher
Gobblin 시작
bin/gobbline-standalone.sh start
bin/gobbline-standalone.sh stop
결과는 파일로 저장된다.
$GOBBLIN_WORK_DIR/job-output/gobbline/part.tast_[]_시간.avro
avro tools
download
curl -O http://central.maven.org/maven2/org/apache/avro/avro-tools/1.8.1/avro-tools-1.8.1.jar
avro -> json
java -jar avro-tools-1.8.1.jar tojson --pretty [job_output].avro > output.json
728x90
728x90
BIG
'Programming > 환경셋팅' 카테고리의 다른 글
ssh config (0) | 2020.06.20 |
---|---|
AWS S3 - ec2에 mount 하기 - s3fs보다 2배 빠르다 - goofys (0) | 2019.12.31 |
vim 에서 go syntax highlight (0) | 2016.04.05 |
node.js 설치 (0) | 2016.04.03 |
Mongo DB (NoSQL) 설치 (0) | 2016.03.30 |
댓글