Linux

PDFium 라이브러리 삽질기 - 4

respiro 2019. 12. 13. 17:16

PDFium 라이브러리 삽질기 - 4

gdal 과 함께 사용하기 (2)

작성일자: 2019년 12월 13일

작성자: N3


0. 새로운 PDFium PDF 엔진의 장점

GDAL 에서는 새로운 PDFium 의 장점을 다음과 같이 설명하고 있다.
  • Significantly higher performance (compared to the previous PoDoFo and Poppler engines)
  • Support for larger PDF files with smaller memory footprint - even large AutoCAD plans or huge GeoPDFs can be processed efficiently now.
  • A non-restrictive BSD license! The copy-left GPL prevented the existence of applications supporting both PDF and MrSID/ECW formats for example.

1. gdal 버전 업 변경 가이드

MIGRATION GUIDE FROM GDAL 2.3 to GDAL 2.4
-----------------------------------------

1) Out-of-tree drivers: RawRasterBand() constructor changes

RawRasterBand now only accepts a VSILFILE* file. Consequently the void* fpRaw
argument has become a VSILFILE* one. And the bIsVSIL = FALSE argument has
been removed. The int bOwnsFP = FALSE has seen its default value suppressed,
and has seen its type changed to the RawRasterBand::OwnFP::YES/NO enumeration,
to detect places where your code must be changed.

Caution: code like RawRasterBand(..., bNativeOrder, TRUE) must be changed to
RawRasterBand(..., bNativeOrder, RawRasterBand::OwnFP::NO, the TRUE value
being the bIsVSIL value, and the default argument being bOwnsFP == FALSE.


MIGRATION GUIDE FROM GDAL 2.4 to GDAL 3.0
-----------------------------------------

- Unix Build: ./configure arguments --without-bsb, --without-grib,
  and --without-mrf have been renamed to --disable-driver-bsb,
  --disable-driver-grib and --disable-driver-mrf

- Substantial changes, sometimes backward incompatible, in coordinate reference
  system and coordinate transformations have been introduced per
  https://trac.osgeo.org/gdal/wiki/rfc73_proj6_wkt2_srsbarn
    * OSRImportFromEPSG() takes into account official axis order.
      Traditional GIS-friendly axis order can be restored with
      OGRSpatialReference::SetAxisMappingStrategy(OAMS_TRADITIONAL_GIS_ORDER);
    * Same for SetWellKnownGeogCS("WGS84") / SetFromUserInput("WGS84")
    * removal of OPTGetProjectionMethods(), OPTGetParameterList() and OPTGetParameterInfo()
      No equivalent.
    * removal of OSRFixup() and OSRFixupOrdering(): no longer needed since objects
      constructed are always valid
    * removal of OSRStripCTParms(). Use OSRExportToWktEx() instead with the
      FORMAT=SQSQL option
    * exportToWkt() outputs AXIS nodes
    * OSRIsSame(): now takes into account data axis to CRS axis mapping, unless
      IGNORE_DATA_AXIS_TO_SRS_AXIS_MAPPING=YES is set as an option to OSRIsSameEx()
    * ogr_srs_api.h: SRS_WKT_WGS84 macro is no longer declared by default since
      WKT without AXIS is too ambiguous. Preferred remediation: use SRS_WKT_WGS84_LAT_LONG.
      Or #define USE_DEPRECATED_SRS_WKT_WGS84 before including ogr_srs_api.h

Out-of-tree drivers:
* GDALDataset::GetProjectionRef() made non-virtual.
  Replaced by GetSpatialRef() virtual method.
  Compatibility emulation possible by defining:
    const char* _GetProjectionRef() override; // note leading underscore
    const OGRSpatialReference* GetSpatialRef() const override {
        return GetSpatialRefFromOldGetProjectionRef();
    }

* GDALDataset::SetProjection() made non-virtual.
  Replaced by SetSpatialRef() virtual method.
  Compatibility emulation possible by defining:
    CPLErr _SetProjection(const char*) override; // note leading underscore
    CPLErr SetSpatialRef(const OGRSpatialReference* poSRS) override {
        return OldSetProjectionFromSetSpatialRef(poSRS);
    }

* GDALDataset::GetGCPProjection() made non-virtual.
  Replaced by GetGCPSpatialRef() virtual method.
  Compatibility emulation possible by defining:
    const char* _GetGCPProjectionRef() override; // note leading underscore
    const OGRSpatialReference* GetGCPSpatialRef() const override {
        return GetGCPSpatialRefFromOldGetGCPProjection();
    }

* GDALDataset::SetGCPs(..., const char* pszWKT) made non-virtual.
  Replaced by SetGCPs(..., const OGRSpatialReference* poSRS) virtual mode.
    CPLErr _SetGCPs( int nGCPCount, const GDAL_GCP *pasGCPList,
                    const char *pszGCPProjection ) override; // note leading underscore
    CPLErr SetGCPs( int nGCPCountIn, const GDAL_GCP *pasGCPListIn,
                    const OGRSpatialReference* poSRS ) override {
        return OldSetGCPsFromNew(nGCPCountIn, pasGCPListIn, poSRS);
    }

2. PDFium 을 사용하는 gdal  패치하기

먼저, 최신 pdfium 으로 수정된 GDAL 의 3.1 git 소스중 gda-3.1/frmts/pdf/ 아래의 소스를 현재 gdal 버전의 같은 위치에 덮어 쓴다.

$ cd gdal-2.3.2/frmts/pdf

$ cp ~/gdal/gdal/frmts/pdf/* .


pdfcreatefromcomposition.cpp 파일(GDALPDFComposerWriter 클래스)이 새로 생긴 것으로 보인다. 

해당 클래스는 일단 빌드과정에서 제거한 후, 향후에 추가해서 일거리를 줄여본다.


GNUmakefile

OBJ     =       pdfdataset.o pdfio.o pdfobject.o pdfcreatecopy.o ogrpdflayer.o pdfwritabledataset.o pdfreadvectors.o pdfcreatefromcomposition.o

..
$(O_OBJ):       pdfobject.h pdfio.h pdfcreatecopy.h pdfcreatefromcomposition.h gdal_pdf.h ../../ogr/ogrsf_frmts/mem/ogr_mem.h
..


빌드해 본다.

./configure \

...

        --with-pdfium           \

        --with-pdfium-extra-lib-for-test="-lpthread -lm -lc -lstdc++ -lz -ljpeg -lopenjp2 -llcms2 -lpng " \

...


libpdfium-devel 과 libpdfium RPM 패키지를 시스템에 설치했다고 가정한다. (이전의 패키징 참조)

checking if we have Poppler >= 0.20.0... yes

checking if we have Poppler >= 0.23.0... yes

checking for podofo... disabled

checking for pdfium... no

configure: error: pdfium requested but not found


제대로 될리가 없다.


pdfium 라이브러리가 있는지 테스트하는 코드의 헤더를 수정해준다.

configure 와 configure.ac

     if test "x$with_pdfium_lib" = "x" ; then

         rm -f testpdfium.*

-        echo '#include <fpdfview.h>' > testpdfium.cpp

-        echo '#include <core/include/fpdfapi/fpdf_page.h>' >> testpdfium.cpp

+        echo '#include <public/fpdfview.h>' > testpdfium.cpp

+        echo '#include <core/fpdfapi/page/cpdf_page.h>' >> testpdfium.cpp

         echo 'int main(int argc, char** argv) { FPDF_InitLibrary(); FPDF_DestroyLibrary(); return 0; } ' >> testpdfium.cpp

         TEST_CXX_FLAGS="-std=c++0x"

         if test ! -z "`uname | grep Darwin`" ; then


다시 빌드해 보면, configure 테스트틀 통과하는 것을 알 수 있다. make 해 본다.

복잡한 POPPLER 버전 확인매크로들이 POPPLER_MAJOR_VERSION 과 POPPLER_MINOR_VERSION  으로 define 이 변경되었다. (frmts/pdf/)


poppler 의 Define 을 추가해 준다.

diff -urN gdal-2.3.2/GDALmake.opt.in gdal-2.3.2-pdf/GDALmake.opt.in

--- gdal-2.3.2/GDALmake.opt.in  2018-09-21 18:01:50.000000000 +0900

+++ gdal-2.3.2-pdf/GDALmake.opt.in      2019-12-12 13:24:28.223648374 +0900

@@ -468,6 +468,8 @@

 #


 HAVE_POPPLER = @HAVE_POPPLER@

+POPPLER_MAJOR_VERSION = @POPPLER_MAJOR_VERSION@

+POPPLER_MINOR_VERSION = @POPPLER_MINOR_VERSION@

 POPPLER_HAS_OPTCONTENT = @POPPLER_HAS_OPTCONTENT@

 POPPLER_BASE_STREAM_HAS_TWO_ARGS = @POPPLER_BASE_STREAM_HAS_TWO_ARGS@

 POPPLER_0_20_OR_LATER = @POPPLER_0_20_OR_LATER@

diff -urN gdal-2.3.2/configure gdal-2.3.2-pdf/configure

--- gdal-2.3.2/configure        2019-12-12 12:40:51.807185686 +0900

+++ gdal-2.3.2-pdf/configure    2019-12-12 13:13:35.928570492 +0900

@@ -663,6 +663,8 @@

 PODOFO_INC

 HAVE_PODOFO

 POPPLER_PLUGIN_LIB

+POPPLER_MINOR_VERSION

+POPPLER_MAJOR_VERSION

 POPPLER_INC

 POPPLER_0_58_OR_LATER

 POPPLER_0_23_OR_LATER

@@ -34381,6 +34383,8 @@



 HAVE_POPPLER=no

+POPPLER_MAJOR_VERSION=

+POPPLER_MINOR_VERSION=

 POPPLER_HAS_OPTCONTENT=no

 POPPLER_BASE_STREAM_HAS_TWO_ARGS=no

 POPPLER_0_20_OR_LATER=no

@@ -34538,8 +34542,21 @@

 $as_echo "disabled" >&6; }

 fi


+if test "$HAVE_POPPLER" = "yes"; then

+    POPPLER_VERSION=`$PKG_CONFIG --modversion poppler`

+    if test "$POPPLER_VERSION" != ""; then

+        HAVE_POPPLER=yes

+        POPPLER_MAJOR_VERSION=`expr $POPPLER_VERSION : '\([0-9]*\)'`

+        POPPLER_MINOR_VERSION=`expr $POPPLER_VERSION : '[0-9]*\.\([0-9]*\)'`

+    fi

+fi

+

 HAVE_POPPLER=$HAVE_POPPLER


+POPPLER_MAJOR_VERSION=$POPPLER_MAJOR_VERSION

+

+POPPLER_MINOR_VERSION=$POPPLER_MINOR_VERSION

+

 POPPLER_HAS_OPTCONTENT=$POPPLER_HAS_OPTCONTENT


 POPPLER_BASE_STREAM_HAS_TWO_ARGS=$POPPLER_BASE_STREAM_HAS_TWO_ARGS



다시 빌드해본다.

..

does not override

     virtual const char* _GetProjectionRef() override;

..

  GDALPamDataset::_SetProjection(pszWKTIn);

..


앞의 GDAL 2.4 에서 3.0 마이그레이션 가이드에 나와 있는 함수들에서 오류가 생기고 있다.

해당 함수를 찾아서 다시 예전방식으로 변경하거나,  해당 코드를 채워넣는다.


gdal_pdf.h

+#if 1

+    OGRSpatialReference* GetSpatialRef() {

+       const char* pWKT = GetProjectionRef();

+       if( !pWKT || pWKT[0] == '\0')

+       {

+           return nullptr;

+       }

+       OGRSpatialReference *m_pSRS = new OGRSpatialReference();

+       if( m_pSRS->importFromWkt(pWKT) != OGRERR_NONE )

+       {

+           return nullptr;

+       }

+        return m_pSRS;

+    }

+

+    CPLErr SetSpatialRef(const OGRSpatialReference* poSRS) {

+       if( !poSRS )

+       {

+           return SetProjection("");

+       }

+       char* pWKT = nullptr;

+       if( poSRS->exportToWkt(&pWKT) != OGRERR_NONE )

+       {

+           CPLFree(pWKT);

+           return CE_Failure;

+       }

+       auto ret = SetProjection(pWKT);

+       CPLFree(pWKT);

+       return ret;

+    }

+}

+#else

+    // Since GDAL 3.0

+    const OGRSpatialReference* GetSpatialRef() const override {

+        return GetSpatialRefFromOldGetProjectionRef();

+    }

+    CPLErr SetSpatialRef(const OGRSpatialReference* poSRS) override {

+        return OldSetProjectionFromSetSpatialRef(poSRS);

+    }

+#endif

+

+#if 0

+    const OGRSpatialReference* GetGCPSpatialRef() const override {

+        return GetGCPSpatialRefFromOldGetGCPProjection();

+    }

+#endif



+#if 0  // 임시
+    const OGRSpatialReference* GetGCPSpatialRef() const override {
+        return GetGCPSpatialRefFromOldGetGCPProjection();
+    }
+#endif

+#if 0 // 임시
+    using GDALPamDataset::SetGCPs;
+    CPLErr SetGCPs( int nGCPCountIn, const GDAL_GCP *pasGCPListIn,
+                    const OGRSpatialReference* poSRS ) override {
+        return OldSetGCPsFromNew(nGCPCountIn, pasGCPListIn, poSRS);
+    }
+#endif


SetAxisMapping 함수는 모두 사용을 막는다. (2.x 에서는 막아도 된다.)

+    // OSRSetAxisMappingStrategy(hSRS, OAMS_TRADITIONAL_GIS_ORDER); // Since GDAL 3.0


+    // poSRS->SetAxisMappingStrategy(OAMS_TRADITIONAL_GIS_ORDER); // Since GDAL 3.0



다시 빌드한다.

/home/respiro/rpmbuild/BUILD/gdal-2.3.2-fedora/.libs/libgdal.so: undefined reference to `CPDF_OCContext::CPDF_OCContext(CPDF_Document*, CPDF_OCContext::UsageType)'

/home/respiro/rpmbuild/BUILD/gdal-2.3.2-fedora/.libs/libgdal.so: undefined reference to `CPDF_RenderContext::AppendLayer(CPDF_PageObjectHolder*, CFX_Matrix const*)'

/home/respiro/rpmbuild/BUILD/gdal-2.3.2-fedora/.libs/libgdal.so: undefined reference to `CPDF_Document::GetPageDictionary(int)'

/home/respiro/rpmbuild/BUILD/gdal-2.3.2-fedora/.libs/libgdal.so: undefined reference to `CPDFPageFromFPDFPage(fpdf_page_t__*)'



음!!! 

libpdfium 동적라이브러리를 뭔가 잘못 빌드했다.


[respiro@localhost shared]$ readelf -a libpdfium.so |grep CPDF_OCContext

   246: 000000000006bbb0   494 FUNC    LOCAL  DEFAULT   12 _ZNK14CPDF_OCContext8GetO

  3998: 000000000006af90   172 FUNC   LOCAL  HIDDEN    12 _ZNK23CPDF_OCContextInter

  3999: 000000000006b0c0    61 FUNC    LOCAL  HIDDEN    12 _ZN14CPDF_OCContextC2EP13

  4000: 00000000003f35e8    48 OBJECT  LOCAL  HIDDEN    18 _ZTV14CPDF_OCContext

  4001: 000000000006b0c0    61 FUNC    LOCAL  HIDDEN    12 _ZN14CPDF_OCContextC1EP13

  4002: 000000000006b100  1146 FUNC    LOCAL  HIDDEN    12 _ZNK14CPDF_OCContext22Loa

  4003: 000000000006b580   704 FUNC    LOCAL  HIDDEN    12 _ZNK14CPDF_OCContext12Loa

  4005: 000000000006b890    24 FUNC    LOCAL  HIDDEN    12 _ZN14CPDF_OCContextD2Ev

  4006: 000000000006b890    24 FUNC    LOCAL  HIDDEN    12 _ZN14CPDF_OCContextD1Ev

  4007: 000000000006b8b0    55 FUNC    LOCAL  HIDDEN    12 _ZN14CPDF_OCContextD0Ev

  4010: 000000000006ba60   323 FUNC    LOCAL  HIDDEN    12 _ZNK14CPDF_OCContext13Get

  4011: 000000000006bda0    19 FUNC    LOCAL  HIDDEN    12 _ZNK14CPDF_OCContext8GetO

  4012: 000000000006bdc0   730 FUNC    LOCAL  HIDDEN    12 _ZNK14CPDF_OCContext13Loa

  4013: 000000000006c0a0   177 FUNC    LOCAL  HIDDEN    12 _ZNK14CPDF_OCContext15Che

  4014: 00000000002496d0    26 OBJECT  LOCAL  HIDDEN    14 _ZTS23CPDF_OCContextInter

  4015: 00000000003fd1f8    24 OBJECT  LOCAL  HIDDEN    21 _ZTI23CPDF_OCContextInter

  4016: 00000000002496f0    17 OBJECT  LOCAL  HIDDEN    14 _ZTS14CPDF_OCContext

  4017: 00000000003fd210    24 OBJECT  LOCAL  HIDDEN    21 _ZTI14CPDF_OCContext


이런, 

삽질중에.... pdfium 라이브러리의 클래스들은 Export 되어 있지 않다는 것을 알게 되었다.


Visibility 가 HIDDEN 으로 되어 있다.

빌드 옵션에서 해당 옵션을 찾아 제거한다. pdflium 라이브러리 동적 빌드를 다시하고, 패키징도 다시하고, 설치도 다시 한다.


해당 사항은 삽질기 2에 수정되어 추가되었다.


다시 빌드한다.

/home/respiro/rpmbuild/BUILD/gdal-2.3.2-fedora/.libs/libgdal.so: undefined reference to `GDALPDFCreateFromCompositionFile(char const*, char const*)'

collect2: error: ld returned 1 exit status

make[1]: *** [gdalinfo] 오류 1


처음에 빌드에서 제외한 클래스를 찾지 못해서 오류가 발생했다.

이제 해당 소스를 추가하고 마이그레이션한다.

또는 다음과 같이 임시로 아래코드를 막는다.


pdfwritabledataset.cpp

..

GDALDataset* PDFWritableVectorDataset::Create( const char * pszName,

                                               int nXSize,

                                               int nYSize,

                                               int nBands,

                                               GDALDataType eType,

                                               char ** papszOptions )

{

    if( nBands == 0 && nXSize == 0 && nYSize == 0 && eType == GDT_Unknown )

    {

        const char* pszFilename = CSLFetchNameValue(papszOptions, "COMPOSITION_FILE");

        if( pszFilename )

        {

            //if( CSLCount(papszOptions) != 1 )

            {

                CPLError(CE_Warning, CPLE_AppDefined,

                         "All others options than COMPOSITION_FILE are ignored");

            }

            //return GDALPDFCreateFromCompositionFile(pszName, pszFilename);

        }

}

해당 기능은 3.1 코드에 추가된 것으로 이 기능을 막고 사용해도 무방할 듯 하다.

코드를 보니 마이그레이션하려면, 꽤 많은 코드를 봐야될듯 싶다. 그래서 무시하자.


다시 빌드하면

[respiro@localhost .libs]$ ldd libgdal.so.20.4.2  | grep pdf

        libpdfium.so => /lib64/libpdfium.so (0x00007ff21ee7d000)



용용 프로그램과 함께빌드하면, apps 폴더 밑에서 테스트 프로그램으로 확인할 수 있다.

apps 폴더에 생성된 유틸리티를 실행해본다.

[respiro@localhost apps]$ ./gdalinfo  --formats | grep PDF

  PDF -raster,vector- (rw+vs): Geospatial PDF


[respiro@localhost apps]$ ./gdalinfo --format PDF

Format Details:

  Short Name: PDF

  Long Name: Geospatial PDF

  Supports: Raster

  Supports: Vector

  Extension: pdf

  Help Topic: frmt_pdf.html

  Supports: Subdatasets

  Supports: Open() - Open existing dataset.

  Supports: Create() - Create writable dataset.

  Supports: CreateCopy() - Create dataset by copying another.

  Supports: Virtual IO - eg. /vsimem/

  Creation Datatypes: Byte

  Supports: Feature styles.


<CreationOptionList>

  <Option name="COMPRESS" type="string-select" description="Compression method for raster data" default="DEFLATE">

    <Value>NONE</Value>

    <Value>DEFLATE</Value>

    <Value>JPEG</Value>

    <Value>JPEG2000</Value>

  </Option>

  <Option name="STREAM_COMPRESS" type="string-select" description="Compression method for stream objects" default="DEFLATE">

    <Value>NONE</Value>

    <Value>DEFLATE</Value>

  </Option>

  <Option name="GEO_ENCODING" type="string-select" description="Format of geo-encoding" default="ISO32000">

    <Value>NONE</Value>

    <Value>ISO32000</Value>

    <Value>OGC_BP</Value>

    <Value>BOTH</Value>

  </Option>

  <Option name="NEATLINE" type="string" description="Neatline" />

  <Option name="DPI" type="float" description="DPI" default="72" />

  <Option name="WRITE_USERUNIT" type="boolean" description="Whether the UserUnit parameter must be written" />

  <Option name="PREDICTOR" type="int" description="Predictor Type (for DEFLATE compression)" />

  <Option name="JPEG_QUALITY" type="int" description="JPEG quality 1-100" default="75" />

  <Option name="JPEG2000_DRIVER" type="string" />

  <Option name="TILED" type="boolean" description="Switch to tiled format" default="NO" />

  <Option name="BLOCKXSIZE" type="int" description="Block Width" />

  <Option name="BLOCKYSIZE" type="int" description="Block Height" />

  <Option name="LAYER_NAME" type="string" description="Layer name for raster content" />

  <Option name="CLIPPING_EXTENT" type="string" description="Clipping extent for main and extra rasters. Format: xmin,ymin,xmax,ymax" />

  <Option name="EXTRA_RASTERS" type="string" description="List of extra (georeferenced) rasters." />

  <Option name="EXTRA_RASTERS_LAYER_NAME" type="string" description="List of layer names for the extra (georeferenced) rasters." />

  <Option name="EXTRA_STREAM" type="string" description="Extra data to insert into the page content stream" />

  <Option name="EXTRA_IMAGES" type="string" description="List of image_file_name,x,y,scale[,link=some_url] (possibly repeated)" />

  <Option name="EXTRA_LAYER_NAME" type="string" description="Layer name for extra content" />

  <Option name="MARGIN" type="int" description="Margin around image in user units" />

  <Option name="LEFT_MARGIN" type="int" description="Left margin in user units" />

  <Option name="RIGHT_MARGIN" type="int" description="Right margin in user units" />

  <Option name="TOP_MARGIN" type="int" description="Top margin in user units" />

  <Option name="BOTTOM_MARGIN" type="int" description="Bottom margin in user units" />

  <Option name="OGR_DATASOURCE" type="string" description="Name of OGR datasource to display on top of the raster layer" />

  <Option name="OGR_DISPLAY_FIELD" type="string" description="Name of field to use as the display field in the feature tree" />

  <Option name="OGR_DISPLAY_LAYER_NAMES" type="string" description="Comma separated list of OGR layer names to display in the feature tree" />

  <Option name="OGR_WRITE_ATTRIBUTES" type="boolean" description="Whether to write attributes of OGR features" default="YES" />

  <Option name="OGR_LINK_FIELD" type="string" description="Name of field to use as the URL field to make objects clickable." />

  <Option name="XMP" type="string" description="xml:XMP metadata" />

  <Option name="WRITE_INFO" type="boolean" description="to control whether a Info block must be written" default="YES" />

  <Option name="AUTHOR" type="string" />

  <Option name="CREATOR" type="string" />

  <Option name="CREATION_DATE" type="string" />

  <Option name="KEYWORDS" type="string" />

  <Option name="PRODUCER" type="string" />

  <Option name="SUBJECT" type="string" />

  <Option name="TITLE" type="string" />

  <Option name="OFF_LAYERS" type="string" description="Comma separated list of layer names that should be initially hidden" />

  <Option name="EXCLUSIVE_LAYERS" type="string" description="Comma separated list of layer names, such that only one of those layers can be ON at a time." />

  <Option name="JAVASCRIPT" type="string" description="Javascript script to embed and run at file opening" />

  <Option name="JAVASCRIPT_FILE" type="string" description="Filename of the Javascript script to embed and run at file opening" />

</CreationOptionList>



<LayerCreationOptionList />


<OpenOptionList>

  <Option name="RENDERING_OPTIONS" type="string-select" description="Which graphical elements to render" default="RASTER,VECTOR,TEXT" alt_config_option="GDAL_PDF_RENDERING_OPTIONS">

    <Value>RASTER,VECTOR,TEXT</Value>

    <Value>RASTER,VECTOR</Value>

    <Value>RASTER,TEXT</Value>

    <Value>RASTER</Value>

    <Value>VECTOR,TEXT</Value>

    <Value>VECTOR</Value>

    <Value>TEXT</Value>

  </Option>

  <Option name="DPI" type="float" description="Resolution in Dot Per Inch" default="72" alt_config_option="GDAL_PDF_DPI" />

  <Option name="USER_PWD" type="string" description="Password" alt_config_option="PDF_USER_PWD" />

  <Option name="LAYERS" type="string" description="List of layers (comma separated) to turn ON (or ALL to turn all layers ON)" alt_config_option="GDAL_PDF_LAYERS" />

  <Option name="LAYERS_OFF" type="string" description="List of layers (comma separated) to turn OFF" alt_config_option="GDAL_PDF_LAYERS_OFF" />

  <Option name="BANDS" type="string-select" description="Number of raster bands" default="3" alt_config_option="GDAL_PDF_BANDS">

    <Value>3</Value>

    <Value>4</Value>

  </Option>

  <Option name="NEATLINE" type="string" description="The name of the neatline to select" alt_config_option="GDAL_PDF_NEATLINE" />

</OpenOptionList>


  Other metadata items:

    HAVE_POPPLER=YES


이게 뭐냐? 분명 config.log 에는 HAVE_PDFIUM='yes'로 되어 있고, 라이브러리도 물고 있는 것을 확인했는데...ㅠㅠ

HAVE_PDFIUM=YES 는 어디로 갔는가?


pdfdataset.cpp

#if defined(HAVE_PDFIUM) && defined(HAVE_POPPLER)

#define HAVE_MULTIPLE_PDF_BACKENDS

#elif defined(HAVE_PDFIUM) && defined(HAVE_PODOFO)

#define HAVE_MULTIPLE_PDF_BACKENDS

#elif defined(HAVE_POPPLER) && defined(HAVE_PODOFO)

#define HAVE_MULTIPLE_PDF_BACKENDS

#endif


#ifdef HAVE_MULTIPLE_PDF_BACKENDS

"  <Option name='PDF_LIB' type='string-select' description='Which underlying PDF library to use' "

#if defined(HAVE_PDFIUM)

  "default='PDFIUM'"

#elif defined(HAVE_POPPLER)

  "default='POPPLER'"

#elif defined(HAVE_PODOFO)

  "default='PODOFO'"

#endif  // ~ default PDF_LIB

  "alt_config_option='GDAL_PDF_LIB'>"

#if defined(HAVE_POPPLER)

"     <Value>POPPLER</Value>\n"

#endif  // HAVE_POPPLER

#if defined(HAVE_PODOFO)

"     <Value>PODOFO</Value>\n"

#endif  // HAVE_PODOFO

#if defined(HAVE_PDFIUM)

"     <Value>PDFIUM</Value>\n"

#endif  // HAVE_PDFIUM

"  </Option>"

#endif // HAVE_MULTIPLE_PDF_BACKENDS


컴파일시에 HAVE_MULTIPLE_PDF_BACKENDS 가 제대로 선언되지 않은 것으로 보인다. 이상한 현상이다.

강제로 define  하고 다시 빌드해본다.


마찬가지다.


바보 아냐!  DLL 을 LD_LIBRARY_PATH로 물고 가도록 해야지....ㅠ ㅠ   (이런게 삽질이다.)


[respiro@localhost apps]$ cd apps

[respiro@localhost apps]$ LD_LIBRARY_PATH=../.libs:$LD_LIBRARY_PATH  ./gdalinfo --format PDF

Format Details:


..

  <Option name="USER_PWD" type="string" description="Password" alt_config_option="PDF_USER_PWD" />

  <Option name="PDF_LIB" type="string-select" description="Which underlying PDF library to use" default="PDFIUM" default="POPPLER" alt_config_option="GDAL_PDF_LIB">

    <Value>POPPLER</Value>

    <Value>PDFIUM</Value>

  </Option>

  <Option name="LAYERS" type="string" description="TEST by KJI List of layers (comma separated) to turn ON (or ALL to turn all layers ON)" alt_config_option="GDAL_PDF_LAYERS" />

  <Option name="LAYERS_OFF" type="string" description="List of layers (comma separated) to turn OFF" alt_config_option="GDAL_PDF_LAYERS_OFF" />

  <Option name="BANDS" type="string-select" description="Number of raster bands" default="3" alt_config_option="GDAL_PDF_BANDS">

    <Value>3</Value>

    <Value>4</Value>

  </Option>

  <Option name="NEATLINE" type="string" description="The name of the neatline to select" alt_config_option="GDAL_PDF_NEATLINE" />

</OpenOptionList>


  Other metadata items:

    HAVE_PDFIUM=YES

    HAVE_POPPLER=YES


파일 열기(Open Options)시에 PDF_LIB 으로 PDF 벡엔드 엔진을 선택하는 옵션이 추가된 것을 알 수 있다.

[respiro@localhost apps]$ LD_LIBRARY_PATH=../.libs:$LD_LIBRARY_PATH ./gdalinfo -oo PDF_LIB=POPPLER ../gdalautotest-2.3.2/gdrivers/data/adobe_style_geospatial.pdf

Driver: PDF/Geospatial PDF

Files: ../gdalautotest-2.3.2/gdrivers/data/adobe_style_geospatial.pdf

Size is 1275, 1650

Coordinate System is:

PROJCS["WGS_1984_UTM_Zone_20N",

    GEOGCS["GCS_WGS_1984",

        DATUM["WGS_1984",

            SPHEROID["WGS_84",6378137.0,298.257223563]],

        PRIMEM["Greenwich",0.0],

        UNIT["Degree",0.0174532925199433]],

    PROJECTION["Transverse_Mercator"],

    PARAMETER["False_Easting",500000.0],

    PARAMETER["False_Northing",0.0],

    PARAMETER["Central_Meridian",-63.0],

    PARAMETER["Scale_Factor",0.9996],

    PARAMETER["Latitude_Of_Origin",0.0],

    UNIT["Meter",1.0]]

Origin = (333274.616544058371801,4940391.759349998086691)

Pixel Size = (42.353069656601626,-42.392994002225727)

Metadata:

  CREATION_DATE=D:20101021125101-07

  CREATOR=ESRI ArcMap 10.0.0.2414

  NEATLINE=POLYGON ((338304.150126181 4896673.63942063,338304.177293829 4933414.79937582,382774.271384474 4933414.54626367,382774.767330031 4896674.27358034,338304.150126181 4896673.63942063))

Corner Coordinates:

Upper Left  (  333274.617, 4940391.759) ( 65d 6' 2.64"W, 44d35'51.19"N)

Lower Left  (  333274.617, 4870443.319) ( 65d 4'42.29"W, 43d58' 5.60"N)

Upper Right (  387274.780, 4940391.759) ( 64d25'14.13"W, 44d36'28.95"N)

Lower Right (  387274.780, 4870443.319) ( 64d24'19.77"W, 43d58'42.54"N)

Center      (  360274.698, 4905417.539) ( 64d45' 4.60"W, 44d17'18.91"N)

Band 1 Block=1275x1 Type=Byte, ColorInterp=Red

Band 2 Block=1275x1 Type=Byte, ColorInterp=Green

Band 3 Block=1275x1 Type=Byte, ColorInterp=Blue



[respiro@localhost apps]$ LD_LIBRARY_PATH=../.libs:$LD_LIBRARY_PATH ./gdalinfo -oo PDF_LIB=PDFIUM ../gdalautotest-2.3.2/gdrivers/data/adobe_style_geospatial.pdf
Driver: PDF/Geospatial PDF
Files: ../gdalautotest-2.3.2/gdrivers/data/adobe_style_geospatial.pdf
Size is 1275, 1650
Coordinate System is:
PROJCS["WGS_1984_UTM_Zone_20N",
    GEOGCS["GCS_WGS_1984",
        DATUM["WGS_1984",
            SPHEROID["WGS_84",6378137.0,298.257223563]],
        PRIMEM["Greenwich",0.0],
        UNIT["Degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["False_Easting",500000.0],
    PARAMETER["False_Northing",0.0],
    PARAMETER["Central_Meridian",-63.0],
    PARAMETER["Scale_Factor",0.9996],
    PARAMETER["Latitude_Of_Origin",0.0],
    UNIT["Meter",1.0]]
Origin = (333275.124066242307890,4940392.123364951461554)
Pixel Size = (42.352600157603341,-42.393311561151322)
Metadata:
  CREATION_DATE=D:20101021125101-07
  CREATOR=ESRI ArcMap 10.0.0.2414
  NEATLINE=POLYGON ((338304.285365684 4896674.10591548,338304.812551275 4933414.85396058,382774.246895812 4933414.85514894,382774.983309293 4896673.9572296,338304.285365684 4896674.10591548))
Corner Coordinates:
Upper Left  (  333275.124, 4940392.123) ( 65d 6' 2.61"W, 44d35'51.20"N)
Lower Left  (  333275.124, 4870443.159) ( 65d 4'42.27"W, 43d58' 5.60"N)
Upper Right (  387274.689, 4940392.123) ( 64d25'14.13"W, 44d36'28.96"N)
Lower Right (  387274.689, 4870443.159) ( 64d24'19.78"W, 43d58'42.54"N)
Center      (  360274.907, 4905417.641) ( 64d45' 4.59"W, 44d17'18.91"N)
Band 1 Block=1275x1 Type=Byte, ColorInterp=Red
  Overviews: 638x825, 319x413, 160x207
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 638x825, 319x413, 160x207
Band 2 Block=1275x1 Type=Byte, ColorInterp=Green
  Overviews: 638x825, 319x413, 160x207
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 638x825, 319x413, 160x207
Band 3 Block=1275x1 Type=Byte, ColorInterp=Blue
  Overviews: 638x825, 319x413, 160x207
  Mask Flags: PER_DATASET ALPHA
  Overviews of mask band: 638x825, 319x413, 160x207
Band 4 Block=1275x1 Type=Byte, ColorInterp=Alpha
  Overviews: 638x825, 319x413, 160x207


PDFium 으로 읽으면 Apha Band 까지 인식하는 것을 알 수 있다.


poppler 라이브러리는 CentOS7의 gdal 기본 백엔드이기 때문에 여기서는 pdfium 만 테스트해 본다.


gdal_translate 로 geotiff 파일을 pdf 로 변경 테스트 해본다.

PDF Create 에서는 PDF_LIB 을 선택하는 옵션이 없다. backend 로 뭘 쓰는지 모르겠다. 소스 추적 삽질이 필요한가?

pdfdataset.cpp  소스를 보면...


if(bHasLib.count() != 1) {

        const char* pszDefaultLib =

                bHasLib.test(PDFLIB_PDFIUM) ? "PDFIUM" :

                bHasLib.test(PDFLIB_POPPLER) ? "POPPLER" : "PODOFO";

        const char* pszPDFLib = GetOption(poOpenInfo->papszOpenOptions, "PDF_LIB", pszDefaultLib );

        while( true )

        {

            if (EQUAL(pszPDFLib, "POPPLER"))

                bUseLib.set(PDFLIB_POPPLER);

            else if (EQUAL(pszPDFLib, "PODOFO"))

                bUseLib.set(PDFLIB_PODOFO);

            else if (EQUAL(pszPDFLib, "PDFIUM"))

                bUseLib.set(PDFLIB_PDFIUM);


            if(bUseLib.count() != 1 || (bHasLib & bUseLib) == 0)

            {

                CPLDebug("PDF", "Invalid value for GDAL_PDF_LIB config option: %s. Fallback to %s",

                        pszPDFLib, pszDefaultLib);

                pszPDFLib = pszDefaultLib;

                bUseLib.reset();

            }

            else

                break;

        }

    }


PDF_LIB 이 여러개인 경우 다음의 순으로 디폴트 라이브러리를 사용한다. 옵션에서 명시하지 않으면.. PDFIUM 이 사용된다.


PDFIUM > POPPLER > PODOFO


$ gdal_translate utm.tif utm-poppler.pdf -of PDF





gdal_translate 로 geopdf 파일을 geotiff 로 변경 테스트 해본다.


[respiro@localhost data]$ cd ~/gdalautotest/gdriver/data

[respiro@localhost data]$ gdal_translate adobe_style_geospatial.pdf adobe_pdfium.tif -oo PDF_LIB=PDFIUM -of GTiff

Input file size is 1275, 1650

0...10...20...30...40...50...60...70...80...90...100 - done.


Qgis 등의 도구에서 열어보면, 좌표를 인식하는 것을 볼 수 있다.


이제 gdal 패키징을 업데이트한다.


현재 My 시스템에서 사용하는 gdal 버전은 fedora 용이다.


https://koji.fedoraproject.org/koji/buildinfo?buildID=1186932


을 수정한 것으로 사용한 pdfium 패치와 SPEC 파일의 수정은 다음과 같다.


gdal-2.3.2 pdfium 패치


 최신 GDAL Git 소스(3.x) 를 사용하는 경우에는 적당히 알아서 쉽게 적용할 수 있을것이며, 앞에서 주석처리한 pdfcreatefromcomposition.cpp 소스 파일도 사용할 수 있을 것이다.


소스 RPM 의 용량문제로 spec 파일 수정내용만 올린다.


gdal.spec 수정 내용.

%global with_pdfium 1   // 추가


...


Patch12:        %{name}-2.3.2-pdfium.patch    // 추가


...

%if 0%{?with_pdfium}

BuildRequires:  libpdfium-devel   // pdfium rpm 패키징 참조

BuildRequires:  lcms2-devel

BuildRequires:  libjpeg-devel

BuildRequires:  libpng-devel

BuildRequires:  zlib-devel

%endif



%if 0%{?with_pdfium}
Requires:       libpdfium
Requires:       lcms2
Requires:       libjpeg
Requires:       libpng
Requires:       zlib
%endif

%setup ..
..
%patch12 -p1 -b .pdfium

% configure \
..
%if 0%{?with_pdfium}
        --with-pdfium           \
        --with-pdfium-extra-lib-for-test="-lpthread -lm -lc -lstdc++ -lz -ljpeg -lopenjp2 -llcms2 -lpng " \
%endif
..

## 추가변수 확인
POPPLER_OPTS="POPPLER_MAJOR_VERSION=0 POPPLER_MINOR_VERSION=26 POPPLER_0_20_OR_LATER=yes POPPLER_0_23_OR_LATER=yes POPPLER_BASE_STREAM_HAS_TWO_ARGS=yes"


CentOS 7 RPM 패키징을 위한 SPEC  파일

(proj_somaj  값은 사용하는 Proj  라이브러리의 이름을 참조하여 각자 수정하기 바람.  예) /usr/lib64/libproj.so.15 이면 15를 사용)


gdal 과 pdfium 의 기본적인 동작이 잘 동작하는 것을 확인했다.


항상 그렇듯이 나지 않는 시간이 나면, 앞에서 주석 처리한 함수를 백포팅해보자.