Re: sqlContext fails to discover parquet partition

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlContext fails to discover parquet partition

moon
Administrator
Appreciate for sharing problem and solution!

Best,
moon

On Tue, Jun 23, 2015 at 10:45 PM Wush Wu <[hidden email]> wrote:
Dear all,

I found the reason.

After enabling the "spark.sql.parquet.useDataSourceApi" in sqlContext, the partition of parquet works correctly.

example code:

```
sqlContext.setConf("spark.sql.parquet.useDataSourceApi", "true")
val ecrtb20150622 = sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.RTB/_year=2015/_month=06/_day=22")
```

Hope this might help others in the future.

Best,
Wush

2015-06-23 10:00 GMT+08:00 Wush Wu <[hidden email]>:
Dear all,

Today we try to load parquet file with partition as instructed in <https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#partition-discovery> :

```
sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11")
```

but we got `java.lang.IllegalArgumentException: Could not find Parquet metadata at path hdfs://bwhdfscluster/bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11`

However, if I new a HiveContext by myself:

```
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
hc.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11")
```

It works.

Is this a bug? Or did I make a mistake in configuration my hdfs cluster?

Thanks,
Wush

Loading...