Github Link
需求
现有2000+文件夹,每个文件夹下有若干excel文件,现在要将这些excel文件中的数据导入mysql。
每个excel文件的第一行是无效数据。
除了excel文件中已有的数据,还要添加一列,名为“at_company”,值为821。
流程
(1)获取excel文件列表,并根据excel文件名确定之后需要创建的table名;
(2)连接mysql
(3)创建table
(4)插入数据
(5)断开连接
依赖模块
1. xlrd # to read excel files 2. mysql-connector-python # to work with Mysql
源代码
1 #-*- coding: utf-8 -*-
2 import os,sys,datetime
3 import mysql.connector
4 import xlrd
5
6 '''
7 the main function to import data
8 username: username of mysql database
9 password: password for username
10 database: a specific database in mysql
11 datapath: the absolute path or relative path of data folder
12 '''
13 def importDataHelper(username, password, database, datapath):
14 '''import data helper'''
15 '''
16 Step 0: Validate input database parameters
17 '''
18 try:
19 conn = mysql.connector.connect(user=username, password=password, database=database, use_unicode=
True)
20 except mysql.connector.errors.ProgrammingError as e:
21 print e
22 return -1
23 '''
24 Step 1: Traverse files in datapath, store file paths and corresponding table names in lists
25 lists[0] is the list of files paths
26 lists[1] is the list of table names
27 '''
28 lists =
getFilesList(datapath)
29 nfiles =
len(lists[0])
30 '''
31 Step 2: Store data in mysql via a for-loop
32 '''
33 cursor =
conn.cursor()
34 for file_idx
in xrange(0, nfiles):
35 file_path =
lists[0][file_idx]
36 print "processing file(%d/%d):[ %s ]"%(file_idx+1
, nfiles, file_path)
37 table_name = lists[1
][file_idx]
38 num =
storeData(file_path, table_name, cursor)
39 if num >=
0:
40 print "[ %d ] data have been stored in TABLE:[ %s ]"%
(num, table_name)
41 conn.commit()
42 cursor.close()
43 '''
44 Step 3: Close connection
45 '''
46 conn.close()
47
48 '''
49 get files list in the dir, including the files in its sub-folders
50 the return list contain two elements, the first element is a file names list
51 and the second element is a table names list(will be used for creating tables in database),
52 '''
53 def getFilesList(dir):
54 path_list =
[]
55 table_list =
[]
56 file_name_list =
os.listdir(dir)
57 for file_name
in file_name_list:
58 path =
os.path.join(dir, file_name)
59 if os.path.isdir(path):
60 '''get the files in sub folder recursively'''
61 tmp_lists =
getFilesList(path)
62 path_list.extend(tmp_lists[0])
63 table_list.extend(tmp_lists[1
])
64 else:
65 path_list.append(path)
66 '''convert file name to mysql table name'''
67 file_name = file_name.split(
'.')[0]
#remove .xls
68 # file_name = file_name.split('from')[0] #remove characters after 'from'
69 file_name = file_name.strip()
#remove redundant space at both ends
70 file_name = file_name.replace(
' ',
'_')
#replace ' ' with '_'
71 file_name = file_name.replace(
'-',
'_')
#replace ' ' with '_'
72 file_name = file_name.lower()
#convert all characters to lowercase
73 table_list.append(file_name)
74 return [path_list, table_list]
75
76 '''
77 store the data of file file_path in table table_name
78 file_path: file location
79 table_name: name of the table that will be created in database
80 cursor: a mysql cursor
81 '''
82 def storeData(file_path, table_name, cursor):
83 ret =
0
84 '''open an excel file'''
85 file =
xlrd.open_workbook(file_path)
86 '''get the first sheet'''
87 sheet =
file.sheet_by_index(0)
88 '''get the number of rows and columns'''
89 nrows =
sheet.nrows
90 ncols =
sheet.ncols
91 '''get column names'''
92 col_names =
[]
93 for i
in range(0, ncols):
94 title = sheet.cell(1
, i).value
95 title =
title.strip()
96 title = title.replace(
' ',
'_')
97 title =
title.lower()
98 col_names.append(title)
99 '''create table in mysql'''
100 sql =
'create table '\
101 +table_name+
' (' \
102 +
'id int NOT NULL AUTO_INCREMENT PRIMARY KEY, ' \
103 +
'at_company varchar(10) DEFAULT \'821\', '
104
105 for i
in range(0, ncols):
106 sql = sql + col_names[i] +
' varchar(150)'
107 if i != ncols-1
:
108 sql +=
','
109 sql = sql +
')'
110 try:
111 cursor.execute(sql)
112 except mysql.connector.errors.ProgrammingError as e:
113 print e
114 # return -1
115
116 '''insert data'''
117 #construct sql statement
118 sql =
'insert into '+table_name+
'('
119 for i
in range(0, ncols-1
):
120 sql = sql + col_names[i] +
', '
121 sql = sql + col_names[ncols-1
]
122 sql +=
') values ('
123 sql = sql +
'%s,'*(ncols-1
)
124 sql +=
'%s)'
125 #get parameters
126 parameter_list =
[]
127 for row
in xrange(2
, nrows):
128 for col
in range(0, ncols):
129 cell_type =
sheet.cell_type(row, col)
130 cell_value =
sheet.cell_value(row, col)
131 if cell_type ==
xlrd.XL_CELL_DATE:
132 dt_tuple =
xlrd.xldate_as_tuple(cell_value, file.datemode)
133 meta_data = str(datetime.datetime(*
dt_tuple))
134 else:
135 meta_data =
sheet.cell(row, col).value
136 parameter_list.append(meta_data)
137 # cursor.execute(sql, parameter_list)
138 try:
139 cursor.execute(sql, parameter_list)
140 parameter_list =
[]
141 ret += 1
142 except mysql.connector.errors.ProgrammingError as e:
143 print e
144 # return -1
145 return ret
146
147
148
149 if __name__ ==
"__main__":
150 if len(sys.argv)<5
:
151 print "Missing Parameters"
152 sys.exit()
153 elif len(sys.argv)>5
:
154 print "Too Many Parameters"
155 sys.exit()
156 username = sys.argv[1
]
157 password = sys.argv[2
]
158 database = sys.argv[3
]
159 datapath = sys.argv[4
]
160 importDataHelper(username, password, database, datapath)
Readme文件(帮导师做的,所以用英文写的文档)
There are two dependency modules need to be installed.
1. xlrd
# to read excel files
2. mysql-connector-python
# to work with Mysql
Directory Structure:
data_path: test files
ImportDataProgram.py: the main program
Procedure:
(1) Get all the paths
and names of the files need to be stored
(2
) Connect MySQL
(3) Create tables
for each file
(4
) Insert data into each table
Usage:
0. create a new database in mysql
For example, after logging in mysql
in terminal, you can use the the following command
"create database test_database" to create a database named
'test_database',
you can replace "test_database" with any other names you like.
1. set username, password, database(created
in step 0)
and datapath
in the tail of ImportDataProgram.py
2
. run ImportDataProgram.py with the following command
python ImportDataProgram.py [username] [password] [database] [datapath]
# username: your username in your mysql
# password: the corresponding password
# database: the database you specific
# datapath: the directory of excel files
e.g.
python ImportDataProgram.py root root test_database data_path
PS:
(1
) The Length of Data In Table
I am not sure the maximum length of data, so I set the
length of data in mysql tables
is 150
characters (you can find
the code in function storeData(file_path, table_name, cursor), the code
is
" sql = sql + col_names[i] + ' varchar(150)' "), you can adjust it according
to your requirements.
(2
)Table Name:
You can set the rules of table name, the code is following the comment code:
'''convert file name to mysql table name''' in function getFilesList(dir).
遇到的坑以及填坑方法:
(1)Python中argv参数用法
(2)Python使用xlrd读取excel文件的方法[1(比较简要)][2(比较详细)]
(3)Python使用xlrd读取excel文件中日期类型变为浮点数[stack overflow][2中文博客]
(4)Python遍历目录下的文件[1]
(5)Python连接MySQL[1]
(6)Python中print格式化输出(%),取消默认换行(,)[print用法]
(7)Python字符串连接[字符串操作]
(8)Python连接list[连接list]
(9)Python字符串替换[字符串替换]
转载于:https://www.cnblogs.com/CheeseZH/p/5050394.html
相关资源:数据结构—成绩单生成器