一枚NLPer小菜鸡

1
%matplotlib inline

Sequence-to-Sequence Modeling with nn.Transformer and TorchText

This is a tutorial on how to train a sequence-to-sequence model
that uses the
nn.Transformer <https://pytorch.org/docs/master/nn.html?highlight=nn%20transformer#torch.nn.Transformer> module.
PyTorch 1.2 release includes a standard transformer module based on the
paper Attention is All You Need <https://arxiv.org/pdf/1706.03762.pdf>
The transformer model
has been proved to be superior in quality for many sequence-to-sequence
problems while being more parallelizable. The nn.Transformer module
relies entirely on an attention mechanism (another module recently
implemented as
nn.MultiheadAttention <https://pytorch.org/docs/master/nn.html?highlight=multiheadattention#torch.nn.MultiheadAttention>)
to draw global dependencies between input and output. The nn.Transformer module is now highly modularized such that a single component (like nn.TransformerEncoder <https://pytorch.org/docs/master/nn.html?highlight=nn%20transformerencoder#torch.nn.TransformerEncoder>in this tutorial) can be easily adapted/composed.

阅读全文 »

1
2
3
4
5
6
import torch.nn as nn
import torch
import numpy as np
from torch.autograd import Variable
import math
import torch.nn.functional as F

注意力计算公式

阅读全文 »

NLP中如何发掘模型的可解释性

可解释性在AI的模型设计中十分重要。需要防止模型存在偏见和缺陷带来的伦理问题,并且帮助决策者理解如何正确地使用我们的模型。越是严苛的场景,越需要模型提供证明它们是如何运作且避免错误的证据。如实时性较强的无人驾驶领域,黑盒模型无法让人们信服其工作的安全性。

通常深度学习模型就像一个黑匣子,它能预测出很好的结果,但是你并不知道它为什么会预测出这样的结果。想知道它是如何工作的,那么得尝试打开这个黑匣子,解释模型的意义十分必要。

阅读全文 »

PDFMiner是一个可以从PDF文档中提取信息的工具。与其他PDF相关的工具不同,它注重的完全是获取和分析文本数据。PDFMiner允许你获取某一页中文本的准确位置和一些诸如字体、行数的信息。它包括一个PDF转换器,可以把PDF文件转换成HTML等格式。它还有一个扩展的PDF解析器,可以用于除文本分析以外的其他用途。官方主页

其特征有:1、完全使用python编写。(适用于2.4或更新版本)2、解析,分析,并转换成PDF文档。3、PDF-1.7规范的支持。(几乎)4、中日韩语言和垂直书写脚本支持。5、各种字体类型(Type1、TrueType、Type3,和CID)的支持。6、基本加密(RC4)的支持。7、PDF与HTML转换。8、纲要(TOC)的提取。9、标签内容提取。10、通过分组文本块重建原始的布局。
如果你的Python有安装pip模块,就可以通过命令“python pip install pdfminer”,自动安装pdfminer。

阅读全文 »

Build Graph

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
coordination_source = """
{name:'兰州', geoCoord:[103.73, 36.03]},
{name:'嘉峪关', geoCoord:[98.17, 39.47]},
{name:'西宁', geoCoord:[101.74, 36.56]},
{name:'成都', geoCoord:[104.06, 30.67]},
{name:'石家庄', geoCoord:[114.48, 38.03]},
{name:'拉萨', geoCoord:[102.73, 25.04]},
{name:'贵阳', geoCoord:[106.71, 26.57]},
{name:'武汉', geoCoord:[114.31, 30.52]},
{name:'郑州', geoCoord:[113.65, 34.76]},
{name:'济南', geoCoord:[117, 36.65]},
{name:'南京', geoCoord:[118.78, 32.04]},
{name:'合肥', geoCoord:[117.27, 31.86]},
{name:'杭州', geoCoord:[120.19, 30.26]},
{name:'南昌', geoCoord:[115.89, 28.68]},
{name:'福州', geoCoord:[119.3, 26.08]},
{name:'广州', geoCoord:[113.23, 23.16]},
{name:'长沙', geoCoord:[113, 28.21]},
//{name:'海口', geoCoord:[110.35, 20.02]},
{name:'沈阳', geoCoord:[123.38, 41.8]},
{name:'长春', geoCoord:[125.35, 43.88]},
{name:'哈尔滨', geoCoord:[126.63, 45.75]},
{name:'太原', geoCoord:[112.53, 37.87]},
{name:'西安', geoCoord:[108.95, 34.27]},
//{name:'台湾', geoCoord:[121.30, 25.03]},
{name:'北京', geoCoord:[116.46, 39.92]},
{name:'上海', geoCoord:[121.48, 31.22]},
{name:'重庆', geoCoord:[106.54, 29.59]},
{name:'天津', geoCoord:[117.2, 39.13]},
{name:'呼和浩特', geoCoord:[111.65, 40.82]},
{name:'南宁', geoCoord:[108.33, 22.84]},
//{name:'西藏', geoCoord:[91.11, 29.97]},
{name:'银川', geoCoord:[106.27, 38.47]},
{name:'乌鲁木齐', geoCoord:[87.68, 43.77]},
{name:'香港', geoCoord:[114.17, 22.28]},
{name:'澳门', geoCoord:[113.54, 22.19]}
"""
1
re.findall("[\d\.]+","{name:'澳门', geoCoord:[113.54, 22.19]}")
['113.54', '22.19']

Get data from source using regular expression

阅读全文 »

1
2
3
4
5
6
7
8
9
10
simple_grammar ="""
sentence => noun_phrase verb_phrase
noun_phrase => Article Adj* noun
Adj* => null | Adj Adj*
verb_phrase => verb noun_phrase
Article => 一个 | 这个
noun => 女人 | 篮球 | 桌子 | 小猫
verb => 看着 | 坐着 | 听见 | 看着
Adj => 蓝色的 | 好看的 | 小小的
"""
1
2
3
another_grammar = """
#
"""
1
import random
1
2
3
def adj():return random.choice('蓝色的|好看的|小小的'.split('|'))
def adj_star():
return random.choice([lambda : '',lambda : adj()+adj_star()])()
1
adj_star()
'蓝色的好看的小小的好看的小小的蓝色的'
阅读全文 »

python

  最近在学python,断断续续的,感觉学的慢,就顺便写写代码,加强对python的感觉。
虽然学了半天还是啥都不会,但是还是要写写博客啥的,激励自己不老懒惰,不能因为眼前的

困难而放弃进步。所以,下面我要贴代码了….

阅读全文 »