多表查询(left join)时简单高效去掉重复列的尝试

2023-07-15 20:55| 来源: 网络整理| 查看: 265

需求背景：数据库中有主表A,有其他表B、C、D、E、F、G…主表A与其他表使用字段report_id进行关联，现将各个表连接起来做成宽表。

基本做法：A表左连接left join其余表。

以三个表连接举例，代码仅选择部分字段：

SELECT * FROM ( SELECT distinct t1.report_id, t1.credit_type, t1.name, t1.certificate_no, t1.create_time, t1.file_path FROM gears.credit_base_info t1 where t1.report_id is not null ) base # 投影的别名 left JOIN ( SELECT distinct t2.report_id, t2.pay_status, t2.month_fee_amt, t2.personal_pay_rate FROM gears.pcredit_house_fund_record t2 where t2.info_update_date is not null ) house_found ON base.report_id = house_found.report_id left JOIN ( SELECT distinct t3.report_id, t3.court_name, t3.cause, t3.register_date, t3.result_type, IF(t3.case_status = '列入失信名单', '1','0') dishonest_status, t3.case_end_date, t3.apply_execution_object, t3.executed_object FROM gears.pcredit_force_execution_record t3 ) force_execution ON base.report_id = force_execution.report_id where base.report_id is not null

因为使用SELECT ，查询出来的表中有三个一模一样的report_id列*，如何去除重复列？

目前未找到可以直接删除重复的列的方法，因此“曲线救国”探索些方法加快写列名的速度。

1. 显示指定select的列名，即只要base.report_id，其他两个投影的report_id不选:

SELECT base.report_id,base.credit_type,base.name,base.certificate_no,base.create_time, base.file_path,house_found.pay_status,house_found.month_fee_amt,house_found.personal_pay_rate,house_found.company_pay_rate, force_execution.register_date,force_execution.result_type,force_execution.dishonest_status,force_execution.case_end_date,force_execution.apply_execution_object,force_execution.executed_object FROM .......

当列名特别多时，写出所有列名的过程要耗费很多时间，大大降低写代码的效率，因此尝试将这个写列名过程变简单快速一些。即想办法快速写出Select表中除了某个字段的所有字段（方法在2中）然后对于不重复的列名不需指定表名，即：

SELECT base.report_id,credit_type,name,certificate_no,create_time, file_path,pay_status,month_fee_amt,personal_pay_rate,company_pay_rate, register_date,result_type,dishonest_status,case_end_date,apply_execution_object,executed_object FROM .......

2. 关于简单写出查询除了某一列的其他列的SQL语句

SELECT CONCAT(' select ',GROUP_CONCAT(COLUMN_NAME),' from ', TABLE_NAME,' ;') FROM information_schema.COLUMNS WHERE table_name = 'xxxx' AND TABLE_SCHEMA = 'ccms_xxx' and COLUMN_NAME != 'id';

执行上面sql得出一条新的sql语句。如图中快速写出了选择除了id这一列的其他列的语句。在这里插入图片描述如此一来，我们select一个表的多个字段的代码就快了很多，生成上述语句后，可去掉其他不要的字段，仅对与其他表重复的字段使用的写法即可。

用此方法左连接的某一个表的代码可快速写出:

left JOIN ( select report_id,sex,birthday,marriage_status,mobile_no,work_tel,home_tel,education,jhi_degree,communication_address, residence_address,spouse_name,spouse_certificate_type,spouse_certificate_no,spouse_work_unit from person_info ) person #投影的别名

最后代码：

#稍微简单的一版 product_db=pymysql.connect(host='trans.magfin.cn', port=6690,user="pd_query", password="8dytjn3s", database="gears", charset='utf8' ) sql1=''' SELECT base.report_id,credit_type,name,certificate_no,create_time, file_path,pay_status,month_fee_amt,personal_pay_rate, court_name,cause,register_date,result_type,dishonest_status,case_end_date,apply_execution_object,executed_object FROM ( SELECT distinct report_id,credit_type,name,certificate_no,create_time,file_path FROM gears.credit_base_info where report_id is not null ) base # 投影的别名 left JOIN ( SELECT distinct report_id,pay_status,month_fee_amt,personal_pay_rate FROM gears.pcredit_house_fund_record where info_update_date is not null ) house_found ON base.report_id = house_found.report_id left JOIN ( SELECT distinct report_id,court_name,cause,register_date,result_type, IF(case_status = '列入失信人名单', '1','0') dishonest_status, case_end_date,apply_execution_object,executed_object FROM gears.pcredit_force_execution_record ) force_execution ON base.report_id = force_execution.report_id where base.report_id is not null ''' credit_model1_df=pd.read_sql(sql1,product_db) credit_model1_df.shape #(7307, 20) credit_model1_df.head(20) credit_model1_df.describe() credit_model1_df.info()

【本文地址】

公司简介

联系我们