Education
Huazhong University of Science and
Technology
M.S. in Software Engineering
GPA: 3.79/4.00
Rank: 4/145
NanChang University
B.S. in Software Engineering
TA in Data Structure & Algorithm
Research Experiences
In the intelligent customer service dialogue system, FAQ is the core
business scenario. With the success of ToC’s customer service robot,
AliXiaomi has gradually transformed into a customer service robot
platform that provides SaaS/PaaS to offer services to various companies
on Alibaba Cloud. However, B-end client data is quite limited and of low
quality in the case of ToB. Thus, the quality of FAQ needs to be
improved.
Task:
In fact, there are models with good performance in other source
domains, and it is hoped that the model can be better represented in the
target domain by introducing mature models in other source domains to
assist the model learning in the target domain with few shot. The key
part lies in the utilization of information within and between domains
to mine the differences and connections between samples.
Action:
1. Multi-grained Interactive Learning: Few shot learning aims at the model learning better representation ability through a small number of samples. It takes samples by N-way-K-shot method to form episodes (corresponding to the conception of batch) for training.
2. Contrastive Learning: In a sampled episode, the SimCSE method is used to modify random mask strategy of query, and query’s nearest neighbor T is augmented, which is expressed in the mature embedding of multi-source domain and the target domain to be learned. Contrastive learning is reflected in that the expression similarity of different domains of the same query should be greater than that of the mature domain of the same query and the nearest neighbor t of query in the target domain, which enhances the cohesive expression ability of the target domain. Similarly, for the sampled episode, the expression similarity between query and nearest neighbor t in the mature domain should be greater than that between query in the mature source domain and nearest neighbor t in the target domain, which enhances the ability to distinguish between the target domain and the mature source domain.
Result:
With the help of store-Xiaomi in the e-commerce field and government affairs Xiaomi in the government service field, the B-end data of Cloud Xiaomi improved by about 2% compared with MLMAN method and improved by about 8% compared with Transfer Learning. A paper titled “Contrastive Learning and Multi-grained Interactivity of Cross Domain Few Shot FAQ” is also accomplished.
Short Term Load Forecasting for Power System based on
Graph Hash Sampling Attention and Contrastive Learning
First Author May 2022 - present
Situation:
In order to maintain the highly efficient operation and increase the stability of power supply for the entire power grid, power load forecasting is a major topic. It can provide convenience for the operation planning, income prediction, electricity price design and energy transactions of the power system. Short term load forecasting (a few minutes or hours in advance) is mainly used to assist real-time energy scheduling, which is of great research value to modern power systems with uncertainties at the power generation end and power consumption end represented by new energy. However, accurate short-term load forecasting is challenging.
Task:The power network is modelled as a graph consisting of nodes and edges, which is a non-Euclidean geometry. And it means that the deep learning method of graph neural network can be applied in the power system network. In particular, for the specific time t, load forecasting of a specific node can be considered as a time series prediction problem. It aims at predicting the load of the nodes at time t using the state set of the previous time t-1 and the input of the current time t. This paper uses graph attention and contrastive learning to model the space-time graph of power grid.
Action:
1. Graph Hash Sampling Attention: Self attention has better long-term sequence correlation modeling ability. However attention mechanism take a lot of resources to calculate dot product similarity, specially in spatial temporal graph. In this paper, we refer the SDIM method use SimHash to approximate attention mechanism. SimHash has been proved that has superior modeling performance in long term user behavior sequences,and it cost a little of resources.In the spatial-temporal graph, because the graph is unstructured data and its internal topology is complex, the capacity of the graph attention mechanism is limited not to achieve the expected results, so the SimHash structure is used to approximate the correlation of the spatial-temporal graph, which strengthens the graph representation capability for modeling the connection and difference between the power network system data exhibited over a long time and wide space.
2. Spatial Temporal Graph Contrastive Learning: In deep learning research , contrastive learning is a commonly used method of metric learning. This method can strengthen the representation ability of the model, the aim of which is to make the relevant features closer in space and the unrelated features farther so as to make the model more informative (closer to the maximum entropy). In this paper, we design the spatial-temporal contrastive learning of graph nodes. Temporal contrastive learning can be described as follows: the similarity between a specific node and its adjacent node N at time t should be greater than that at other time instant. The spatial contrastive learning can be described as follows: the similarity of a specific node at time T and at non-T time instant should be greater than the similarity between and itself at time T and its neighbor at non-T time. Through the above description, we can strengthen the node representation ability of the model and make the prediction more accurate.
Result:
Compared with the short term load forecasting for power system using SOTA method in 2021(Ada-GWN), we find that the MAE value is reduced about 0.13 and the MAPE value is reduced about 1.8%. Currently, we are adjusting the system parameters and designing further ablation experiments.
Honors & Awards
Internship Experiences
1. Served as a natural language processing algorithm engineer, responsible for the TOB’s cloud-Xiaomi few shot dialogue task.
2. A cross domain learning framework with few shot based on
contrastive learning is proposed. This framework can assist in learning
the target domain in the case of multiple source domains. By further
deepening the sampling in few shot learning, the expression
similarity of different domains of the same query should be greater than
that of mature source domain expression and the query’s nearest neighbor t(generated by random mask with query) in
the target domain. Similarly, for the sampled episode, the expression
similarity between query and nearest neighbor t in the mature source domain
should be greater than that between query in the mature domain and
nearest neighbor t in the target domain. In summary, the framework
improves the representation ability of the target domain in the case of
cold start and few samples. In real business, the time cost of cloud
Xiaomi B-end customer access is reduced from two weeks to three
days.
Tencent Lightspeed & Quantum Studios Apr 2021 -
Jun 2021
1. Served as a natural language processing algorithm engineer, responsible for the multi style user nickname generation tasks of PUBG.
2. A multi style nickname generation model based on GPT pre-training model is proposed and efficiently implemented. Compared with the style transformer, a multi style migration model based on cycleGAN and STARGAN, and the disentanglement replacement method commonly used in speech style migration. The model decouples style tasks and nickname tasks. When the model converges, it can generate a total number of nicknames with a specified style of about 800million. When the model converges, it can generate a total of about 800 million nicknames of the specified styles.