李润 (Run Li)

Information

Run Li
1037 Luoyu Road Wuhan, China 430074
(+86)181 6074 2624
root999@hust.edu.cn / root999@aliyun.com

Education

Huazhong University of Science and Technology
M.S. in Software Engineering
GPA: 3.79/4.00
Rank: 4/145

NanChang University
B.S. in Software Engineering
TA in Data Structure & Algorithm

Research Experiences

Contrastive Learning and Multi-grained Interactivity of Cross Domain Few Shot FAQ

Alibaba DAMO Academy First Author Jun 2021 - Sep 2021
Situation:

In the intelligent customer service dialogue system, FAQ is the core business scenario. With the success of ToC’s customer service robot, AliXiaomi has gradually transformed into a customer service robot platform that provides SaaS/PaaS to offer services to various companies on Alibaba Cloud. However, B-end client data is quite limited and of low quality in the case of ToB. Thus, the quality of FAQ needs to be improved.
Task:

In fact, there are models with good performance in other source domains, and it is hoped that the model can be better represented in the target domain by introducing mature models in other source domains to assist the model learning in the target domain with few shot. The key part lies in the utilization of information within and between domains to mine the differences and connections between samples.
Action:

1. Multi-grained Interactive Learning: Few shot learning aims at the model learning better representation ability through a small number of samples. It takes samples by N-way-K-shot method to form episodes (corresponding to the conception of batch) for training.

2. Contrastive Learning: In a sampled episode, the SimCSE method is used to modify random mask strategy of query, and query’s nearest neighbor T is augmented, which is expressed in the mature embedding of multi-source domain and the target domain to be learned. Contrastive learning is reflected in that the expression similarity of different domains of the same query should be greater than that of the mature domain of the same query and the nearest neighbor t of query in the target domain, which enhances the cohesive expression ability of the target domain. Similarly, for the sampled episode, the expression similarity between query and nearest neighbor t in the mature domain should be greater than that between query in the mature source domain and nearest neighbor t in the target domain, which enhances the ability to distinguish between the target domain and the mature source domain.

Result:

With the help of store-Xiaomi in the e-commerce field and government affairs Xiaomi in the government service field, the B-end data of Cloud Xiaomi improved by about 2% compared with MLMAN method and improved by about 8% compared with Transfer Learning. A paper titled “Contrastive Learning and Multi-grained Interactivity of Cross Domain Few Shot FAQ” is also accomplished.

Short Term Load Forecasting for Power System based on Graph Hash Sampling Attention and Contrastive Learning
First Author May 2022 - present
Situation:

In order to maintain the highly efficient operation and increase the stability of power supply for the entire power grid, power load forecasting is a major topic. It can provide convenience for the operation planning, income prediction, electricity price design and energy transactions of the power system. Short term load forecasting (a few minutes or hours in advance) is mainly used to assist real-time energy scheduling, which is of great research value to modern power systems with uncertainties at the power generation end and power consumption end represented by new energy. However, accurate short-term load forecasting is challenging.

Task:

The power network is modelled as a graph consisting of nodes and edges, which is a non-Euclidean geometry. And it means that the deep learning method of graph neural network can be applied in the power system network. In particular, for the specific time t, load forecasting of a specific node can be considered as a time series prediction problem. It aims at predicting the load of the nodes at time t using the state set of the previous time t-1 and the input of the current time t. This paper uses graph attention and contrastive learning to model the space-time graph of power grid.

Action:

1. Graph Hash Sampling Attention: Self attention has better long-term sequence correlation modeling ability. However attention mechanism take a lot of resources to calculate dot product similarity, specially in spatial temporal graph. In this paper, we refer the SDIM method use SimHash to approximate attention mechanism. SimHash has been proved that has superior modeling performance in long term user behavior sequences,and it cost a little of resources.In the spatial-temporal graph, because the graph is unstructured data and its internal topology is complex, the capacity of the graph attention mechanism is limited not to achieve the expected results, so the SimHash structure is used to approximate the correlation of the spatial-temporal graph, which strengthens the graph representation capability for modeling the connection and difference between the power network system data exhibited over a long time and wide space.

2. Spatial Temporal Graph Contrastive Learning: In deep learning research , contrastive learning is a commonly used method of metric learning. This method can strengthen the representation ability of the model, the aim of which is to make the relevant features closer in space and the unrelated features farther so as to make the model more informative (closer to the maximum entropy). In this paper, we design the spatial-temporal contrastive learning of graph nodes. Temporal contrastive learning can be described as follows: the similarity between a specific node and its adjacent node N at time t should be greater than that at other time instant. The spatial contrastive learning can be described as follows: the similarity of a specific node at time T and at non-T time instant should be greater than the similarity between and itself at time T and its neighbor at non-T time. Through the above description, we can strengthen the node representation ability of the model and make the prediction more accurate.

Result:

Compared with the short term load forecasting for power system using SOTA method in 2021(Ada-GWN), we find that the MAE value is reduced about 0.13 and the MAPE value is reduced about 1.8%. Currently, we are adjusting the system parameters and designing further ablation experiments.

Honors & Awards

Excellent Graduate Student of Huazhong University of Science and Technology 2022
National Scholarship 2021
First-class Academic Scholarship of Huazhong University of Science and Technology 2019, 2020, 2021
Merit Postgraduate Student 2020, 2021
First-class ’Zhixing’ Scholarship of Huazhong University of Science and Technology2020
ShenZhen Stock Exchange Scholarship 2020
The ‘Zhihui Cup’ University Geek Challenge by SPDB & Baidu
National Second Award 2020
The Tianchi-Digital China Intelligent Ocean Construction algorithm Competition
National Rank: 16/3275 2020
The First National College Students’ Artificial Intelligence Innovation Competition
National Winning Award 2018
The Sixth China Software Cup College Students Software Design Competition Undergraduate Group
National Third Award 2017
The 2017th ‘Huameng’ National Open Source Software Creative Competition for College Students
National First Award 2017
The 8th ’Lanqiao Cup’ Jiangxi Division C/C++ Programming Group A(Provincial Second Award)
Provincial Second Award2017
The 3rd Jiangxi Internet+ College Students Innovation and Entrepreneurship Competition
Provincial Bronze Award 2017
Special-class Scholarship of Nanchang University2017

Internship Experiences

Alibaba DAMO Academy Jun 2021 - Sep 2021

1. Served as a natural language processing algorithm engineer, responsible for the TOB’s cloud-Xiaomi few shot dialogue task.

2. A cross domain learning framework with few shot based on contrastive learning is proposed. This framework can assist in learning the target domain in the case of multiple source domains. By further deepening the sampling in few shot learning, the expression similarity of different domains of the same query should be greater than that of mature source domain expression and the query’s nearest neighbor t(generated by random mask with query) in the target domain. Similarly, for the sampled episode, the expression similarity between query and nearest neighbor t in the mature source domain should be greater than that between query in the mature domain and nearest neighbor t in the target domain. In summary, the framework improves the representation ability of the target domain in the case of cold start and few samples. In real business, the time cost of cloud Xiaomi B-end customer access is reduced from two weeks to three days.
Tencent Lightspeed & Quantum Studios Apr 2021 - Jun 2021

1. Served as a natural language processing algorithm engineer, responsible for the multi style user nickname generation tasks of PUBG.

2. A multi style nickname generation model based on GPT pre-training model is proposed and efficiently implemented. Compared with the style transformer, a multi style migration model based on cycleGAN and STARGAN, and the disentanglement replacement method commonly used in speech style migration. The model decouples style tasks and nickname tasks. When the model converges, it can generate a total number of nicknames with a specified style of about 800million. When the model converges, it can generate a total of about 800 million nicknames of the specified styles.