bugfix> python > 投稿
df = sqlContext.sql("select d1.a, d1.b, d1.c as aaa, d2.d, d2.e, d2.f, d2.g, d2.h, d2.i, d2.j as length, '{1}' as month_end from df1 d1 join df2 d2 on concat(substr(upper(trim(d1.a)),0,d1.j),' ') = substr(upper(trim(d2.j)),0,(d2.j+1)) and upper(trim(d1.c)) = upper(trim(d2.f)) where length(upper(trim(d2.i))) > d2.j and length(upper(trim(d1.a))) = (d1.j+3)".format(dataBase, month_end))

誰かが上記の結合をSQL結合ではなくデータフレーム結合に変換するのを手伝ってくれますか?

試した:

joinDf = df1.join(df2,on=[(concat(substring(upper(trim(df1["a"])),0,df1["j"]),' ')) == substring(upper(trim(df2["j"])),0,(df2["j"]+1)) and upper(trim(df1["c"])) == upper(trim(df2["f"]))])

(選択なし)

エラーの取得:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p2667.3017/lib/spark/python/pyspark/sql/functions.py", line 1180, in substring
    return Column(sc._jvm.functions.substring(_to_java_column(str), pos, len))
  File "/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p2667.3017/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 798, in __call__
  File "/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p2667.3017/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 785, in _get_args
  File "/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p2667.3017/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_collections.py", line 512, in convert
TypeError: 'Column' object is not callable