再探 NLP 筆記

 之前,忘了多久之前,曾經小小用過分詞工具,可以很快分詞幾百個字幾秒鐘就分好了,但是分詞結果很不理想,最近又想動手做個專案,可能會用到分詞,又去爬了爬文,還是無法繞過分詞,而分詞這工作,因為 AI 又有了一定程度的發展,除了傳統分詞方式,現在好像還可以借助 AI 之力(毒藥還是很可口的,所以敗亡之路,早已經舖好),要用 AI 了!?然後…連取得都有困難…, AI 果然讓人 I I 叫啊…

工作環境:
    Windows 10 home
    anaconda
        base: python 3.8
        paddle
        jieba

一開始的時侯, anaconda 一直說找不到 paddle 的套件,後來在一個網站上看到用這個終於順利裝上了(paddlenlp),結果測試的時侯,還是出現了沒有找到 Paddle 的訊息,因為要從對岸的清大來源裝的時侯出現了錯誤訊息,只好再重新找如何安裝的 channel ,終於不用降版(很多文章都說要降版)就能裝了:

pip install --upgrade paddlenlp -i https://pypi.org/simple

  以為裝上了,結果測試的時侯,還是出現了沒有找到 Paddle 的訊息,因為要從對岸的清大來源裝的時侯出現了錯誤訊息,只好再重新找如何安裝的 channel ,好像很容易就可以裝好:

conda install paddlepaddle
PackagesNotFoundError: The following packages are not available from current channels:

- paddlepaddle 

搜索 anaconda :會出現很多來源,選一下

anaconda search -t conda paddlepaddle
InsecureRequestWarning: Unverified HTTPS request is being made to host 'api.anaconda.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
warnings.warn(
Packages:
Name | Version | Package Types | Platforms | Builds
------------------------- | ------ | --------------- | --------------- | ----------
Esri/paddlepaddle-gpu | 2.1.0 | conda | linux-64, win-64 | py38_gpu_cuda11.2_many_linux, py39_gpu_cuda11.2_many_linux, py39_gpu_cuda11.2_windows, py38_gpu_cuda11.2_windows, py36_gpu_cuda11.2_many_linux, py37_gpu_cuda11.2_many_linux, py36_gpu_cuda11.2_windows, py37_gpu_cuda11.2_windows
: an easy-to-use, efficient, flexible and scalable deep learning platform
Paddle/paddlepaddle | 2.1.2 | conda | linux-64, win-64, osx-64 | py35_mac, py36_cpu_windows, py39_mac, py27_cpu_windows, py39_cpu_many_linux, py35_cpu_many_linux, py36_cpu_many_linux, py36_mac, py39_cpu_windows, py27_cpu_many_linux, py35_cpu_windows, py37_cpu_many_linux, py38_cpu_many_linux, py38_mac, py27_mac, py37_mac, py37_cpu_windows, py38_cpu_windows
: an easy-to-use, efficient, flexible and scalable deep learning platform
Paddle/paddlepaddle-gpu | 2.1.2 | conda | linux-64, win-64 | py37_gpu_cuda10.0_windows, py39_gpu_cuda11.2_many_linux, py37_gpu_cuda10.0_many_linux, py35_gpu_cuda10.0_windows, py35_gpu_cuda11.0_windows, py36_gpu_cuda11.0_many_linux, py39_gpu_cuda10.2_many_linux, py35_gpu_cuda9.0_windows, py36_gpu_cuda11.0_windows, py38_gpu_cuda9.0_many_linux, py39_gpu_cuda11.2_windows, py35_gpu_cuda10.2_windows, py37_gpu_cuda9.0_many_linux, py38_gpu_cuda10.0_many_linux, py35_gpu_cuda10.1_many_linux, py27_gpu_cuda9.0_many_linux, py37_gpu_cuda10.1_many_linux, py35_gpu_cuda10.2_many_linux, py36_gpu_cuda10.2_windows, py36_gpu_cuda9.0_windows, py37_gpu_cuda10.2_windows, py38_gpu_cuda11.0_windows, py37_gpu_cuda11.2_windows, py27_gpu_cuda10.0_many_linux, py27_gpu_cuda10.0_windows, py36_gpu_cuda11.2_windows, py39_gpu_cuda10.1_windows, py36_gpu_cuda10.1_many_linux, py27_gpu_cuda10.1_many_linux, py36_gpu_cuda11.2_many_linux, py39_gpu_cuda10.1_many_linux, py27_gpu_cuda10.1_windows, py37_gpu_cuda10.2_many_linux, py38_gpu_cuda10.1_many_linux, py37_gpu_cuda11.2_many_linux, py38_gpu_cuda11.2_windows, py27_gpu_cuda10.2_many_linux, py38_gpu_cuda10.2_windows, py38_gpu_cuda9.0_windows, py36_gpu_cuda10.1_windows, py37_gpu_cuda11.0_many_linux, py27_gpu_cuda9.0_windows, py35_gpu_cuda9.0_many_linux, py38_gpu_cuda11.2_many_linux, py35_gpu_cuda10.1_windows, py38_gpu_cuda10.0_windows, py38_gpu_cuda11.0_many_linux, py35_gpu_cuda11.0_many_linux, py36_gpu_cuda10.2_many_linux, py39_gpu_cuda10.2_windows, py38_gpu_cuda10.2_many_linux, py36_gpu_cuda9.0_many_linux, py37_gpu_cuda11.0_windows, py38_gpu_cuda10.1_windows, py37_gpu_cuda10.1_windows, py36_gpu_cuda10.0_windows, py36_gpu_cuda10.0_many_linux, py35_gpu_cuda10.0_many_linux, py37_gpu_cuda9.0_windows
: an easy-to-use, efficient, flexible and scalable deep learning platform
baomengxue/paddlepaddle | 1.5.2 | conda | osx-64 | py37_mac
: an easy-to-use, efficient, flexible and scalable deep learning platform
baomengxue/paddlepaddle-gpu | | conda | [] |
: an easy-to-use, efficient, flexible and scalable deep learning platform
sangjinchao/paddlepaddle | 1.8.1 | conda | linux-64, osx-64, win-64 | py27_cpu_windows, py36_cpu_windows, py35_mac, py36_cpu_many_linux, py36_mac, py27_mac, py27_cpu_many_linux, py35_cpu_many_linux, py37_cpu_many_linux, py35_cpu_windows, py37_mac, py37_cpu_windows
: an easy-to-use, efficient, flexible and scalable deep learning platform
sangjinchao/paddlepaddle-gpu | 1.8.1 | conda | linux-64, win-64 | py27_gpu_cuda9.0_windows, py37_gpu_cuda10.0_windows, py36_gpu_cuda10.0_windows, py37_gpu_cuda10.0_many_linux, py36_gpu_cuda9.0_windows, py37_gpu_cuda9.0_many_linux, py35_gpu_cuda10.0_windows, py27_gpu_cuda9.0_many_linux, py27_gpu_cuda10.0_many_linux, py36_gpu_cuda9.0_many_linux, py27_gpu_cuda10.0_windows, py35_gpu_cuda9.0_windows, py35_gpu_cuda9.0_many_linux, py36_gpu_cuda10.0_many_linux, py35_gpu_cuda10.0_many_linux, py37_gpu_cuda9.0_windows
: an easy-to-use, efficient, flexible and scalable deep learning platform
xieyunshen/paddlepaddle | 2.0.0rc0 | conda | osx-64 | py38_mac, py35_mac, py37_mac, py27_mac, py36_mac
: an easy-to-use, efficient, flexible and scalable deep learning platform
xieyunshen_private/paddlepaddle-gpu | 2.0.0rc1 | conda | linux-64, win-64 | py27_gpu_cuda9.0_many_linux, py35_gpu_cuda10.2_windows, py27_gpu_cuda10.0_many_linux
: an easy-to-use, efficient, flexible and scalable deep learning platform
xieyunshen_private_1/paddlepaddle-gpu | 1.8.3 | conda | linux-64 | py37_gpu_cuda10.0_many_linux, py36_gpu_cuda10.0_many_linux, py35_gpu_cuda10.0_many_linux, py27_gpu_cuda10.0_many_linux
: an easy-to-use, efficient, flexible and scalable deep learning platform
Found 10 packages

Run 'anaconda show <USER/PACKAGE>' to get installation details 

 看 anaconda 給的提示,選一個來裝:

anaconda show Paddle/paddlepaddle
Using Anaconda API: https://api.anaconda.org
C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'api.anaconda.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  warnings.warn(
Name:    paddlepaddle
Summary: an easy-to-use, efficient, flexible and scalable deep learning platform
Access:  public
Package Types:  conda
Versions:
   + 1.5.0
   + 1.5.1
   + 1.5.2
   + 1.6.0rc0
   + 1.6.0
   + 1.6.1
   + 1.6.2
   + 1.6.3
   + 1.7.0
   + 1.7.1
   + 1.7.2
   + 1.8.0
   + 1.8.1
   + 1.8.2
   + 1.8.3
   + 1.8.4
   + 1.8.5
   + 2.0.0rc0
   + 2.0.0rc1
   + 2.0.0
   + 2.0.1
   + 2.0.2
   + 2.1.0
   + 2.1.1
   + 2.1.2

To install this package with conda run:
     conda install --channel https://conda.anaconda.org/Paddle paddlepaddle
 

 看 anaconda 給的回應,照燒,不!是照打:

conda install --channel https://conda.anaconda.org/Paddle paddlepaddle
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

environment location: C:\ProgramData\Anaconda3

added / updated specs:
- paddlepaddle


The following packages will be downloaded:

package | build
---------------------------|-----------------
astor-0.8.1 | py38haa95532_0 47 KB
gast-0.3.3 | py_0 14 KB
libprotobuf-3.17.2 | h23ce68f_1 1.9 MB
paddlepaddle-2.1.2 | py38_cpu_windows 48.2 MB Paddle
protobuf-3.17.2 | py38hd77b12b_0 257 KB
------------------------------------------------------------
Total: 50.4 MB

The following NEW packages will be INSTALLED:

astor pkgs/main/win-64::astor-0.8.1-py38haa95532_0
gast pkgs/main/noarch::gast-0.3.3-py_0
libprotobuf pkgs/main/win-64::libprotobuf-3.17.2-h23ce68f_1
paddlepaddle Paddle/win-64::paddlepaddle-2.1.2-py38_cpu_windows
protobuf pkgs/main/win-64::protobuf-3.17.2-py38hd77b12b_0


Proceed ([y]/n)?


Downloading and Extracting Packages
astor-0.8.1 | 47 KB | ############################################################################ | 100%
protobuf-3.17.2 | 257 KB | ############################################################################ | 100%
paddlepaddle-2.1.2 | 48.2 MB | ############################################################################ | 100%
gast-0.3.3 | 14 KB | ############################################################################ | 100%
libprotobuf-3.17.2 | 1.9 MB | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: failed

EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
environment location: C:\Somewhere\Anaconda# 

 接著來測試一下:

from paddlenlp.datasets import load_dataset

 又來,找不到 six ,不會吧,有套件叫這名字的,結果還真的是,裝一裝就好了

conda install six

結果還是找不到 paddle 回去安裝的訊息看了一下,沒有裝成功,應該是權限問題,用管理員身份到 cmd ,再次執行 anaconda(env: base) :

conda activate
conda install --channel https://conda.anaconda.org/Paddle paddlepaddle
...blablabla... 
done 

接著來測試一下,報錯又有東東沒 import 到,將它們 import 進來:

from paddlenlp.datasets import load_dataset
import img, sys, os 

沒有錯誤訊息:初步算是裝好了。

牛刀小試:

import jieba
import paddle
str1 = '車禍發生,一輛自用轎車自撞分隔島翻車,造成1人死亡、一人受困,1名乘客搶救中,現場車輛無法行進,請駕駛人繞道行駛,並請勿圍觀,以利現場消防隊人員進行救護工作'
paddle.enable_static()
jieba.enable_paddle()
seg_list = jieba.cut(str1, use_paddle=True)
print('Paddle模式分詞結果:' + '/'.join(seg_list)) 

 

import jieba
str1 = '車禍發生,一輛自用轎車自撞分隔島翻車,造成1人死亡、一人受困,1名乘客搶救中,現場車輛無法行進,請駕駛人繞道行駛,並請勿圍觀,以利現場消防隊人員進行救護工作'
seg_list = jieba.cut_for_search(str1)
print('搜尋引擎模式分詞結果:' + '/'.join(seg_list))

 呼~~~有驚無險!


收工!

 

----------------------------------- 一些參照的資料 --------------------------------------

範例: https://aistudio.baidu.com/aistudio/projectdetail/1968542 

conda的安裝: https://www.paddlepaddle.org.cn/documentation/docs/zh/install/conda/windows-conda.html

conda 安裝(要搜索套件時): https://blog.csdn.net/ALZFterry/article/details/108326563

conda cheatsheet(PDF): https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf

範例:https://iter01.com/586897.html

留言

這個網誌中的熱門文章

使用 Excel 計算2個地點之間的直線距離

LINE 儲存的檔案傳到 email 不方便 很不方便 非常不方便 但是有解的筆記

Excel 巨集合併多個 Excel 檔案