香蕉与打火机

机器学习和AI的时代来了

Latest Posts

scala的foreach功能使用java的List类型问题

在对java的List类型使用scala的foreach时,idea报错。

scala的foreach使用java数据类型出错

经查,原来scala的foreach是不能用于java数据类型的。程序写的太low了,应该将java的底子完全舍弃,在建立数据类型的时候直接使用scala的数据类型,才能更好的使用scala的语言特性。

work around:

增加一句:

import scala.collection.JavaConversions._

不过还是建议直接用scala的数据类型吧。

参考链接:http://alvinalexander.com/scala/converting-java-collections-to-scala-list-map-array

使用sbt自动运行scala程序

sbt run

cd your-project-pwd

sbt ‘run-main your-main’ > out.txt

cron table

*6**** /usr/batch/test1

每天6点执行

test1脚本中即可添加sbt run的内容。

 

栗子:

#!/bin/sh
#
# ---------------------------------------------------------------------
# Auto parser install script.
# ---------------------------------------------------------------------
#


echo 'export RDB_PARSER_PROJECT_PATH=your-project-path' >> /etc/profile
. /etc/profile
echo ' * 1 * * * root $PWD/auto_parser_ff' >> /etc/crontab
echo ' * 7 * * * root $PWD/auto_parser_tf' >> /etc/crontab
echo ' * 14 * * * root $PWD/auto_parser_mf' >> /etc/crontab

 

 

使用rpm升级系统软件

参考链接(鸟哥):http://linux.vbird.org/linux_basic/0520rpm_and_srpm.php#rpmmanager_update

使用 RPM 來升級真是太簡單了!就以 -Uvh 或 -Fvh 來升級即可,而 -Uvh 與 -Fvh 可以用的選項與參數,跟 install 是一樣的。不過, -U 與 -F 的意義還是不太一樣的,基本的差別是這樣的:

-Uvh 後面接的軟體即使沒有安裝過,則系統將予以直接安裝; 若後面接的軟體有安裝過舊版,則系統自動更新至新版;
-Fvh 如果後面接的軟體並未安裝到你的 Linux 系統上,則該軟體不會被安裝;亦即只有已安裝至你 Linux 系統內的軟體會被『升級』!

 

删除软件:rpm -e

遇到有依赖无法删除,建议不要强行删除。检查yum为上策。

 

 

Java并发编程:线程池的使用 executor.shutdown() executor.awaitTermination(7, TimeUnit.DAYS)

参考链接:http://www.cnblogs.com/dolphin0520/p/3932921.html

几个理解的key

  1. 线程池执行实现了Runnable的类,其实是反射调用该类实现的run()函数
  2. 线程池加载所有任务后,可以关闭(不能再添加新的线程任务),等待所有线程执行完后退出。使用以下代码实现
         executor.shutdown()
         executor.awaitTermination(7, TimeUnit.DAYS)

ulimit配置修改

修改配置文件:/etc/security/limits.conf

* soft   nofile   32768
* hard nofile 65536

星号代表全局,针对某个用户的话就把星号改为某用户

weblogic      soft    nproc   2048
weblogic      hard    nproc   16384
weblogic      soft    nofile  8192
weblogic      hard    nofile  65536

全部说明:

#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - an user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open files
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#
#<domain>      <type>  <item>         <value>
#

 

 

使用spark加载并读取parquet格式的文件——之使用scala版

前言

It has been a long time.

最近有需求读取并测试parquet格式的文件。目前hive、impala、spark等框架均支持parquet。

本文是采用scala接口的spark进行简单的hello world。

进入spark-shell环境

[root@hadoop01 ~]# su - spark
[spark@hadoop01 ~]$ spark-
spark-class   spark-shell   spark-sql     spark-submit  
[spark@hadoop01 ~]$ spark-shell 
15/10/10 10:24:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/10 10:24:12 INFO SecurityManager: Changing view acls to: spark
15/10/10 10:24:12 INFO SecurityManager: Changing modify acls to: spark
15/10/10 10:24:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
15/10/10 10:24:12 INFO HttpServer: Starting HTTP Server
15/10/10 10:24:12 INFO Server: jetty-8.y.z-SNAPSHOT
15/10/10 10:24:12 INFO AbstractConnector: Started SocketConnector@0.0.0.0:48566
15/10/10 10:24:12 INFO Utils: Successfully started service 'HTTP class server' on port 48566.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
15/10/10 10:24:17 INFO SparkContext: Running Spark version 1.3.1
[…A lot of spark log output…]
15/10/10 10:24:28 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/10/10 10:24:29 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala>

注意:

  1. 输出可以看到这里用的是spark1.3.1
  2. spark-shell自动为你注册了Spark context,该对象名字为:sc。后面直接使用sc对象进行设置。
  3. 看到scala提示符,就意味着可以进行编程测试了。

注册SQLContext并进行配置

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@1530f74e

scala> sqlContext.setConf("spark.sql.parquet.binaryAsString","true")

scala>

步骤:

  1. 使用sc(Spark context)对象注册SQLContext
  2. 使用setConf对SQLContext进行配置
  3. 可配置的参数见:http://spark.apache.org/docs/1.3.1/sql-programming-guide.html#parquet-files
  4. 可配置的参数在1.5.1的最新版本中增加了很多

导入parquet文件

导入文件:

scala> val parquetFile = sqlContext.parquetFile("/tmp/me.parquet")
15/10/10 10:47:12 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
parquetFile: org.apache.spark.sql.DataFrame = [CONTRACTID: string, TDATETIME: string, CONTRACTNAME: string, LASTPX: double, HIGHPX: double, LOWPX: double, CQ: double, TQ: double, LASTQTY: double, INITOPENINTS: double, OPENINTS: double, INTSCHG: double, TURNOVER: double, RISELIMIT: double, FALLLIMIT: double, PRESETTLE: double, PRECLOSE: double, OPENPX: double, CLOSEPX: double, SETTLEMENTPX: double, LIFELOW: double, LIFEHIGH: double, AVGPX: double, BIDIMPLYQTY: double, ASKIMPLYQTY: double, SIDE: string, S1: double, B1: double, SV1: double, BV1: double, S5: double, S4: double, S3: double, S2: double, B2: double, B3: double, B4: double, B5: double, SV5: double, SV4: double, SV3: double, SV2: double, BV2: double, BV3: double, BV4: double, BV5: double, PREDELTA: double, CURRDELTA: double, CHG...

打印parquet文件的schema:

scala> parquetFile.printSchema()
root
 |-- CONTRACTID: string (nullable = false)
 |-- TDATETIME: string (nullable = false)

注意:如果sqlContext.setConf(“spark.sql.parquet.binaryAsString”,”false”),则列数据类型将为原始的binary。这里自动进行了转换。

将parquet文件注册为临时表

scala> parquetFile.registerTempTable("parquetFile")

parquet表的DML操作

执行sql:

scala> val tdatetime = sqlContext.sql("SELECT TDATETIME FROM parquetFile")
tdatetime: org.apache.spark.sql.DataFrame = [TDATETIME: string]

遍历结果:

scala> contractid.map(t => "TDATETIME: " + t(0)).collect().foreach(println)
15/10/10 10:17:56 INFO MemoryStore: ensureFreeSpace(223942) called with curMem=276682, maxMem=15558896517
[…A lot of spark log output…]
15/10/10 10:17:56 INFO DAGScheduler: Stage 1 (collect at <console>:26) finished in 0.265 s
15/10/10 10:17:56 INFO DAGScheduler: Job 1 finished: collect at <console>:26, took 0.284977 s
TDATETIME: 2015-05-04 08:45:35.223
TDATETIME: 2015-05-04 08:46:36.067
TDATETIME: 2015-05-04 08:47:36.940
[……]

 

 

 

 

 

mysql使用存储过程生成测试数据

声明存过

# pre delete
drop procedure insert_parquet;

#declare procedure
delimiter @
create procedure insert_parquet(in item integer)
begin
declare counter int;
set counter = item;
while counter >= 1 do
insert into parquet values(counter,concat('company',counter),counter+0.1,CURTIME());
set counter = counter - 1;
end while;
end
@
delimiter ;

测试使用

mysql> truncate table parquet;
Query OK, 0 rows affected

mysql> call insert_parquet(1000000);<br>Query OK, 1 row affected (50 min 18.30 sec)<br>

生成了100w条数据。还挺快,不到1小时完成。

 

 

 

RedHat使用vnc远程显示桌面

安装vncserver

[root@test~]# yum install vnc-server

 

配置vnc

[root@test ~]# vim /etc/sysconfig/vncservers 

 VNCSERVERS="2:root"
 VNCSERVERARGS[2]="-geometry 800x600 -nolisten tcp -localhost"

 

启动vnc

[root@abmdev01 ~]# vncserver 

You will require a password to access your desktops.

Password:
Verify:
Passwords don't match - try again
Password:
Verify:

New 'abmdev01:1 (root)' desktop is abmdev01:1

Starting applications specified in /root/.vnc/xstartup
Log file is /root/.vnc/abmdev01:1.log

使用客户端连接

QQ截图20150804171734

 

关闭vnc连接

[root@test~]# vncserver -list

TigerVNC server sessions:

X DISPLAY #     PROCESS ID
:1              6069
:2              6487
[root@test~]# vncserver -kill :1
Killing Xvnc process ID 6069
[root@test~]# vncserver -kill :2
Killing Xvnc process ID 6487

关闭vnc服务

[root@test~]# service vncserver stop