[英]Python - openpyxl - Use openpyxl to get number of rows that contain a specific value
I'm newer to Python.我是 Python 的新手。 I'm using openpyxl for a SEO project for my brother and I'm trying to get a number of rows that contain a specific value in them.
我正在为我兄弟的 SEO 项目使用 openpyxl,我正在尝试获取其中包含特定值的许多行。
I have a spreadsheet that looks something like this:我有一个看起来像这样的电子表格:
I want to write a program that will get the keywords and parse them to a string by state, so like: Missouri = "search item 1, search item 2, search item 5, search item 6" Illinois = "search item 3, search item 4"我想编写一个程序来获取关键字并将它们按州解析为字符串,例如: Missouri = "search item 1, search item 2, search item 5, search item 6" Illinois = "search item 3, search项目 4"
I have thus far created a program like this:到目前为止,我已经创建了一个这样的程序:
#first, import openpyxl
import openpyxl
#next, give location of file
path = "testExcel.xlsx"
#Open workbook by creating object
wb_object = openpyxl.load_workbook(path)
#Get workbook active sheet object
sheet_object = wb_object.active
#Getting the value of maximum rows
#and column
row = sheet_object.max_row
column = sheet_object.max_column
print("Total Rows:", row)
print("Total Columns:", column)
#printing the value of forth column, state
#Loop will print all values
#of first column
print("\nValue of fourth column")
for i in range(4, row + 1):
cell_object = sheet_object.cell(row=i, column=4)
split_item_test = cell_object.value.split(",")
split_item_test_result = split_item_test[0]
state = split_item_test_result
print(state)
if (state == 'Missouri'):
print(state.count('Missouri'))
print("All good")
The problem is after doing this, I see that it prints 1 repeatedly, but not a total number for Missouri.问题是这样做之后,我看到它重复打印 1,但不是密苏里州的总数。 I would like a total number of mentions of the state, and then eventually get it to a string with each search criteria.
我想要一个状态的总提及次数,然后最终将它变成一个包含每个搜索条件的字符串。
Is this possible with openpyxl?这可能与 openpyxl 吗? Or will I need a different library?
或者我需要一个不同的图书馆吗?
Ok another option好的另一个选择
This will create a dictionary 'state_dict' in the format per your question这将按照您的问题的格式创建字典“state_dict”
Missouri = "search item 1, search item 2, search item 5, search item 6"
Missouri = "搜索项1,搜索项2,搜索项5,搜索项6"
Illinois = "search item 3, search item 4"Illinois = "搜索第3项,搜索第4项"
...
print("\nValue of fourth column")
state_dict = {}
for row in sheet_object.iter_rows(min_row=2, max_row=sheet_object.max_row):
k = row[3].value.split(',')[1].strip()
v = row[0].value
if k in state_dict:
state_dict[k] += [v]
else:
state_dict[k] = [v]
### Print values
for key, value in state_dict.items():
print(f'{key}, Total {len(value)}', end='; ')
for v in value:
print(f'{v}', end=', ')
print('')
Will create the dictionary 'state_dict' as so;将这样创建字典“state_dict”;
'Missouri' = {list: 4} ['search item 1', 'search item 2', 'search item 5', 'search item 6']
'Illinois' = {list: 2} ['search item 3', 'search item 4']
'Alabama' = {list: 1} ['search item 7']
'Colorado' = {list: 1} ['search item 8']
Print output打印输出
Value of fourth column
Missouri = Total 4; search item 1, search item 2, search item 5, search item 6,
Illinois = Total 2; search item 3, search item 4,
Alabama = Total 1; search item 7,
Colorado = Total 1; search item 8,
###--------------Additional Information -----------------------### ### - - - - - - - 附加信息 - - - - - - - - - - - -###
Updated the state_dict to include the rank details for each item.更新了 state_dict 以包含每个项目的排名详细信息。
The output display now shows each items for each state in rank order.输出显示现在按排名顺序显示每个州的每个项目。 You have two options on how the data may be restricted;
关于如何限制数据,您有两种选择;
Maximum rank to show, the variable rank_max = 100 determines the highest rank the output will display so if it's set to 5 then only ranks 1, 2, 3, 4 and 5 will be displayed if the State has items with those ranks.要显示的最大等级,变量 rank_max = 100 确定输出将显示的最高等级,因此如果将其设置为 5,则如果该州具有具有这些等级的项目,则仅显示等级 1、2、3、4 和 5。
The total_ranks_to_display determines number of ranks to display. total_ranks_to_display 确定要显示的排名数。 So regardless of the rank_max value this will restrict the number of ranks shows with rank 1 as top.
因此,无论 rank_max 值如何,这都会限制以排名 1 为最高的排名显示的数量。 Example;
例子; if you set rank_max to 10 and this would show 8 rows for Missouri then setting the total_ranks_to_display to 4 will mean only the top 4 ranks will show.
如果您将 rank_max 设置为 10,这将显示密苏里州的 8 行,那么将 total_ranks_to_display 设置为 4 将意味着仅显示前 4 个排名。
You can use either or both to achieve what you need I think.您可以使用其中之一或两者来实现我认为的需要。
...
print("\nValue of fourth column")
state_dict = {}
for row in sheet_object.iter_rows(min_row=2, max_row=sheet_object.max_row):
k = row[3].value.split(',')[1].strip()
v = row[0].value
r = row[2].value
if k in state_dict:
if r in state_dict[k]:
state_dict[k][r] += [v]
else:
state_dict[k].update({r: [v]})
else:
state_dict[k] = {r: [v]}
rank_max = 10
total_ranks_to_display = 4
for key, value in state_dict.items():
print(f'{key}')
top_count = 0
for i in range(1, rank_max):
if i in state_dict[key]:
top_count += 1
cur_rank = state_dict[key][i]
total = len(cur_rank)
print(f'Rank: {i} Total: {total} {cur_rank}')
if top_count == total_ranks_to_display:
break
Example Output示例输出
Max rank is 10 and the total ranks to display is 4最大排名为 10,要显示的总排名为 4
Value of fourth column
Missouri
Rank: 1 Total: 1 ['search item 19']
Rank: 2 Total: 3 ['search item 5', 'search item 13', 'search item 18']
Rank: 3 Total: 2 ['search item 14', 'search item 20']
Rank: 4 Total: 1 ['search item 22']
Alabama
Rank: 1 Total: 2 ['search item 26', 'search item 28']
Rank: 2 Total: 2 ['search item 3', 'search item 12']
Rank: 3 Total: 1 ['search item 6']
Rank: 5 Total: 1 ['search item 15']
Illinois
Rank: 1 Total: 2 ['search item 11', 'search item 17']
Rank: 2 Total: 1 ['search item 24']
Rank: 3 Total: 1 ['search item 4']
Rank: 6 Total: 1 ['search item 23']
Colorado
Rank: 2 Total: 1 ['search item 21']
Rank: 3 Total: 3 ['search item 25', 'search item 27', 'search item 29']
Rank: 4 Total: 1 ['search item 30']
Rank: 6 Total: 2 ['search item 8', 'search item 16']
ranemirusG is right, there are several ways to obtain the same result. ranemirusG 是对的,有几种方法可以获得相同的结果。 Here's another option...I attempted to preserve your thought process, good luck.
这是另一种选择......我试图保留你的思维过程,祝你好运。
print("\nValue of fourth column")
missouri_list = [] # empty list
illinois_list = [] # empty list
for i in range(2, row+1): # It didn't look like "4, row+1" captured the full sheet, try (2, row+1)
cell_object = sheet_object.cell(row=i, column=4)
keyword = sheet_object.cell(row=i, column=1)
keyword_fmt = keyword.value # Captures values in Keyword column
split_item_test = cell_object.value.split(",")
split_item_test_result = split_item_test[1] # 1 captures states
state = split_item_test_result
print(state)
# simple if statement to capture results in a list
if 'Missouri' in state:
missouri_list.append(keyword_fmt)
if 'Illinois' in state:
illinois_list.append(keyword_fmt)
print(missouri_list)
print(len(missouri_list)) # Counts the number of occurances
print(illinois_list)
print(len(illinois_list)) # Counts the number of occurances
print("All good")
Yes, it's possible with openpyxl
.是的,可以使用
openpyxl
。 To achieve your real goal try something like this:要实现您的真正目标,请尝试以下操作:
states_and_keywords = {}
for i in range(4, row + 1):
cell_object = sheet_object.cell(row=i, column=4)
split_item_test = cell_object.value.split(",")
split_item_test_result = split_item_test[1] #note that the element should be 1 for the state
state = split_item_test_result.strip(" ") #trim whitespace (after comma)
keyword = cell_object.offset(0,-3).value #this gets the value of the keyword for that row
if state not in states_and_keywords:
states_and_keywords[state] = [keyword]
else:
states_and_keywords[state].append(keyword)
print(states_and_keywords)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.