实习遇到的坑....-白红宇

实习遇到的坑....

阅读量：4154 次

发布时间：2019-05-25

本文共 5125 字，大约阅读时间需要 17 分钟。

1.使用rz命令上传文件时：

上传大一些的文件或者含有控制字符的时候SecureCRT提示：

zmodem transfer canceled by remote side

解决：使用 rz -e 命令可以解决这个问题

2.mapReduce

mapper数量50W+，reduce数量设置3000，可能是数量设置过少，失败了10%，丢掉了约300个part。

怀疑是reduce任务过大，导致失败，准备重新启动20000个reducer重新跑。

3.hadoop tar包问题，对于需要在hadoop上依赖的文件，可以用 -file命令上传到hadoop中并解压，如下：

./hadoop streaming \    -D mapred.job.map.capacity=50 \    -D mapred.job.reduce.capacity=2000 \    -D mapred.reduce.tasks=10000 \    -D mapred.job.queue.name=vsearch\    -D mapred.job.priority=HIGH\    -D mapred.job.tracker=yq01-heng-job.dmop.baidu.com:54311\    -D mapred.job.name=sunpeng13_face_feature \    -input www..../user/mms/mmss/sunpeng13/filter_simid_final/* \    -output www...../user/mms/mmss/sunpeng13/face_feature_final/ \    -cacheArchive www.../user/mms/merge_by_sunpeng/mms-hpc-devel.tar#mms \    -cacheArchive www...../user/mms/merge_by_sunpeng/proto.tar#proto \    -mapper "mms/mms-hpc-devel/bin/python face_extract_final.py" \    -reducer "cat" \    -file ../../code/face_extract_final.py \    -outputformat org.apache.hadoop.mapred.lib.SuffixMultipleTextOutputFormat

hadoop streaming多路输出，在标准输出基础上，在hadoop-v2-u7中，引入了reduce多路输出的功能，允许一个reduce输出到多个part-xxxxx-X文件，其中X是A-Z的字母之一，使用方式是在输出key,value对的时候，在value的后面追加”#X”两个字节后缀，后缀不同的key,value输出到不同的part-xxxxx-X文件，value的后缀”#X”自动删除。

4.tensorflow fine tuning 问题：

all_vars = tf.global_variables()    print all_vars    # resnet_v2_50/logits/weights:0    # resnet_v2_50/logits/biases:0    with tf.variable_scope("resnet_v2_50"):        with tf.variable_scope("logits",reuse=True):            weights = tf.get_variable(name='weights')            biases = tf.get_variable(name='biases')    all_vars.remove(weights)    all_vars.remove(biases)    saver = tf.train.Saver(var_list=all_vars)#var_list=[]

其中使用saver恢复变量的时候,在创建saver的时候传入遍历列表，即可选出需要恢复的变量。

5.tensorflow共享变量

result1 = my_image_filter(image1)result2 = my_image_filter(image2)# Raises ValueError(... conv1/weights already exists ...)就像你看见的一样，tf.get_variable()会检测已经存在的变量是否已经共享.如果你想共享他们，你需要像下面使用的一样，通过reuse_variables()这个方法来指定.with tf.variable_scope("image_filters") as scope:    result1 = my_image_filter(image1)    scope.reuse_variables()    result2 = my_image_filter(image2)用这种方式来共享变量是非常好的，轻量级而且安全.

第一个gpu是reuse是关的，然后创建变量，后面其他其他gpu时reuse是开的，这样所有gpu使用同一份variable。

原理就是reuse=False的时候，get_variable是创建变量，然后如果变量已经存在就会报错

True的时候是在找已经存在的变量，如果不存在就会报错。

6.tensorflow中在外围函数获取内层函数变量值：

print  tf.global_variables()        gr = tf.get_default_graph()        array =  gr.get_tensor_by_name("cluster_centers:0").eval()        num_clusters = gr.get_tensor_by_name("num_clusters:0").eval()

7.nohup 命令，不挂起地运行命令，并把输出指定到nohup.out中，可以实时查看。训练Net比较方便。

用途：不挂断地运行命令。语法：nohup Command [ Arg ... ] [　& ]描述：nohup 命令运行由 Command 参数和任何相关的 Arg 参数指定的命令，忽略所有挂断（SIGHUP）信号。在注销后使用 nohup 命令运行后台中的程序。要运行后台中的 nohup 命令，添加 & （ 表示“and”的符号）到命令的尾部。

8.图片数据增强方法：

9.在 Python 中我怎么判断一个 tensor 的 shape ？

在 TensorFlow 中，一个 tensor 具备静态和动态两种 shape 。静态的 shape 可以用 tf.Tensor.get_shape() 方法读出：这种 shape 是由此 tensor 在创建时使用的操作(operations)推导得出的，可能是 partially complete 的。如果静态 shape 没有完整定义(not fully defined)的话，则一个 tensor 的动态 shape 可通过求 tf.shape(t) 的值得到。

x.set_shape() 和 x = tf.reshape(x) 有什么区别？

tf.Tensor.set_shape() 方法(method)会更新(updates)一个 Tensor 对象的静态 shape ，当静态 shape 信息不能够直接推导得出的时候，此方法常用来提供额外的 shape 信息。它不改变此 tensor 动态 shape 的信息。

tf.reshape() 操作(operation)会以不同的动态 shape 创建一个新的 tensor。

10.开发机GPU tf运行显示 libcublas.so.8.0 no file

export LD_LIBRARY_PATH=~/apps/mms-hpc-devel/lib64:/home/opt/cuda-8.0/lib64/

11.linux 等待一个进程结束，再起另一个进程命令。

while [[ `ps ax | awk '{print $1}' | grep PID | wc -l` == 1 ]]; do sleep 5m; done ; python ....

grep 滤出进程

awk 格式化输出

ps ax显示所有进程

12.tensorflow中feed_dict操作是将CPU数据传入到GPU中，大数据操作非常耗时。

13.

编码转换：

Python内部的字符串一般都是 Unicode编码。代码中字符串的默认编码与代码文件本身的编码是一致的。所以要做一些编码转换通常是要以Unicode作为中间编码进行转换的，即先将其他编码的字符串解码（decode）成 Unicode，再从 Unicode编码（encode）成另一种编码。decode 的作用是将其他编码的字符串转换成 Unicode 编码，eg name.decode(“GB2312”)，表示将GB2312编码的字符串name转换成Unicode编码encode 的作用是将Unicode编码转换成其他编码的字符串，eg name.encode(”GB2312“)，表示将GB2312编码的字符串name转换成GB2312编码所以在进行编码转换的时候必须先知道 name 是那种编码，然后 decode 成 Unicode 编码，最后载 encode 成需要编码的编码。当然了，如果 name 已经就是 Unicode 编码了，那么就不需要进行 decode 进行解码转换了，直接用 encode 就可以编码成你所需要的编码。值得注意的是：对 Unicode 进行编码和对 str 进行编码都是错误的。

14.

hadoop大文件统计行数

./hadoop streaming \

-D mapred.job.map.capacity=2000 \

-D mapred.reduce.tasks=1 \

-D stream.memory.limit=50 \

-input hdfs://yq01-heng-hdfs.dmop.baidu.com:54310/app/ecom/aries/vs/mmsg/log/wrench_sample/1018/sample-img \

-output hdfs://yq01-heng-hdfs.dmop.baidu.com:54310/app/ecom/aries/vs/mmsg/log/wrench_sample/1018/wc_l \

-mapper “wc -l” \

-reducer “cat”

15.python保存json格式到文件前未使用json.dumps()方法，取出文件内容时解析前使用eval()函数。

16. caffe scale层：当输入个数为两个时：

即，按元素计算连个输入的乘积。该过程以广播第二个输入来匹配第一个输入矩阵的大小。

也就是通过平铺第二个输入矩阵来计算按元素乘积（点乘）。

Computes a product of two input Blobs, with the shape of the latter Blob “broadcast” to match the shape of the former. Equivalent to tiling the latter Blob, then computing the elementwise product.

The second input may be omitted, in which case it’s learned as a parameter of the layer.

17.caffe Interp Layer:

zoom_factor：使feature_map扩大N倍。也可以减小。

Caffe Eltwise Layer:

Eltwise层的操作有三个：product（点乘）， sum（相加减）和 max（取大值），其中sum是默认操作。

你可能感兴趣的文章

DeepLearning tutorial（7）深度学习框架Keras的使用-进阶

查看>>

第三方SDK：JPush SDK Eclipse

查看>>

第三方开源库：imageLoader的使用

查看>>