简体   繁体   English

Python - openpyxl - 使用 openpyxl 获取包含特定值的行数

[英]Python - openpyxl - Use openpyxl to get number of rows that contain a specific value

I'm newer to Python.我是 Python 的新手。 I'm using openpyxl for a SEO project for my brother and I'm trying to get a number of rows that contain a specific value in them.我正在为我兄弟的 SEO 项目使用 openpyxl,我正在尝试获取其中包含特定值的许多行。

I have a spreadsheet that looks something like this:我有一个看起来像这样的电子表格: 样本电子表格

I want to write a program that will get the keywords and parse them to a string by state, so like: Missouri = "search item 1, search item 2, search item 5, search item 6" Illinois = "search item 3, search item 4"我想编写一个程序来获取关键字并将它们按州解析为字符串,例如: Missouri = "search item 1, search item 2, search item 5, search item 6" Illinois = "search item 3, search项目 4"

I have thus far created a program like this:到目前为止,我已经创建了一个这样的程序:

    #first, import openpyxl
    import openpyxl

    #next, give location of file
    path = "testExcel.xlsx"

    #Open workbook by creating object
    wb_object = openpyxl.load_workbook(path)

    #Get workbook active sheet object
    sheet_object = wb_object.active

    #Getting the value of maximum rows
    #and column
    row = sheet_object.max_row
    column = sheet_object.max_column
    print("Total Rows:", row)
    print("Total Columns:", column)

    #printing the value of forth column, state
    #Loop will print all values
    #of first column
    print("\nValue of fourth column")
    for i in range(4, row + 1):
        cell_object = sheet_object.cell(row=i, column=4)
        split_item_test = cell_object.value.split(",")
        split_item_test_result = split_item_test[0]
        state = split_item_test_result
        print(state)
        if (state == 'Missouri'):
            print(state.count('Missouri'))
    print("All good")

The problem is after doing this, I see that it prints 1 repeatedly, but not a total number for Missouri.问题是这样做之后,我看到它重复打印 1,但不是密苏里州的总数。 I would like a total number of mentions of the state, and then eventually get it to a string with each search criteria.我想要一个状态的总提及次数,然后最终将它变成一个包含每个搜索条件的字符串。

Is this possible with openpyxl?这可能与 openpyxl 吗? Or will I need a different library?或者我需要一个不同的图书馆吗?

Ok another option好的另一个选择
This will create a dictionary 'state_dict' in the format per your question这将按照您的问题的格式创建字典“state_dict”

Missouri = "search item 1, search item 2, search item 5, search item 6" Missouri = "搜索项1,搜索项2,搜索项5,搜索项6"
Illinois = "search item 3, search item 4" Illinois = "搜索第3项,搜索第4项"

...
print("\nValue of fourth column")
state_dict = {}
for row in sheet_object.iter_rows(min_row=2, max_row=sheet_object.max_row):
    k = row[3].value.split(',')[1].strip()
    v = row[0].value
    if k in state_dict:
        state_dict[k] += [v]
    else:
        state_dict[k] = [v]

### Print values
for key, value in state_dict.items():
    print(f'{key}, Total {len(value)}', end='; ')
    for v in value:
        print(f'{v}', end=', ')
    print('')

Will create the dictionary 'state_dict' as so;将这样创建字典“state_dict”;

'Missouri' = {list: 4} ['search item 1', 'search item 2', 'search item 5', 'search item 6']
'Illinois' = {list: 2} ['search item 3', 'search item 4']
'Alabama' = {list: 1} ['search item 7']
'Colorado' = {list: 1} ['search item 8']

Print output打印输出

Value of fourth column
Missouri = Total 4; search item 1, search item 2, search item 5, search item 6, 
Illinois = Total 2; search item 3, search item 4, 
Alabama = Total 1; search item 7, 
Colorado = Total 1; search item 8, 

###--------------Additional Information -----------------------### ### - - - - - - - 附加信息 - - - - - - - - - - - -###
Updated the state_dict to include the rank details for each item.更新了 state_dict 以包含每个项目的排名详细信息。
The output display now shows each items for each state in rank order.输出显示现在按排名顺序显示每个州的每个项目。 You have two options on how the data may be restricted;关于如何限制数据,您有两种选择;
Maximum rank to show, the variable rank_max = 100 determines the highest rank the output will display so if it's set to 5 then only ranks 1, 2, 3, 4 and 5 will be displayed if the State has items with those ranks.要显示的最大等级,变量 rank_max = 100 确定输出将显示的最高等级,因此如果将其设置为 5,则如果该州具有具有这些等级的项目,则仅显示等级 1、2、3、4 和 5。
The total_ranks_to_display determines number of ranks to display. total_ranks_to_display 确定要显示的排名数。 So regardless of the rank_max value this will restrict the number of ranks shows with rank 1 as top.因此,无论 rank_max 值如何,这都会限制以排名 1 为最高的排名显示的数量。 Example;例子; if you set rank_max to 10 and this would show 8 rows for Missouri then setting the total_ranks_to_display to 4 will mean only the top 4 ranks will show.如果您将 rank_max 设置为 10,这将显示密苏里州的 8 行,那么将 total_ranks_to_display 设置为 4 将意味着仅显示前 4 个排名。
You can use either or both to achieve what you need I think.您可以使用其中之一或两者来实现我认为的需要。

...
print("\nValue of fourth column")
state_dict = {}
for row in sheet_object.iter_rows(min_row=2, max_row=sheet_object.max_row):
    k = row[3].value.split(',')[1].strip()
    v = row[0].value
    r = row[2].value
    if k in state_dict:
        if r in state_dict[k]:
            state_dict[k][r] += [v]
        else:
            state_dict[k].update({r: [v]})
    else:
        state_dict[k] = {r: [v]}


rank_max = 10
total_ranks_to_display = 4
for key, value in state_dict.items():
    print(f'{key}')
    top_count = 0
    for i in range(1, rank_max):
        if i in state_dict[key]:
            top_count += 1
            cur_rank = state_dict[key][i]
            total = len(cur_rank)
            print(f'Rank: {i} Total: {total} {cur_rank}')
        if top_count == total_ranks_to_display:
            break

Example Output示例输出
Max rank is 10 and the total ranks to display is 4最大排名为 10,要显示的总排名为 4

Value of fourth column
Missouri
Rank: 1 Total: 1 ['search item 19']
Rank: 2 Total: 3 ['search item 5', 'search item 13', 'search item 18']
Rank: 3 Total: 2 ['search item 14', 'search item 20']
Rank: 4 Total: 1 ['search item 22']
Alabama
Rank: 1 Total: 2 ['search item 26', 'search item 28']
Rank: 2 Total: 2 ['search item 3', 'search item 12']
Rank: 3 Total: 1 ['search item 6']
Rank: 5 Total: 1 ['search item 15']
Illinois
Rank: 1 Total: 2 ['search item 11', 'search item 17']
Rank: 2 Total: 1 ['search item 24']
Rank: 3 Total: 1 ['search item 4']
Rank: 6 Total: 1 ['search item 23']
Colorado
Rank: 2 Total: 1 ['search item 21']
Rank: 3 Total: 3 ['search item 25', 'search item 27', 'search item 29']
Rank: 4 Total: 1 ['search item 30']
Rank: 6 Total: 2 ['search item 8', 'search item 16']

ranemirusG is right, there are several ways to obtain the same result. ranemirusG 是对的,有几种方法可以获得相同的结果。 Here's another option...I attempted to preserve your thought process, good luck.这是另一种选择......我试图保留你的思维过程,祝你好运。

print("\nValue of fourth column")

missouri_list = [] # empty list
illinois_list = [] # empty list

for i in range(2, row+1): # It didn't look like "4, row+1" captured the full sheet, try (2, row+1)
    cell_object = sheet_object.cell(row=i, column=4)
    keyword = sheet_object.cell(row=i, column=1)
    keyword_fmt = keyword.value # Captures values in Keyword column
    split_item_test = cell_object.value.split(",")
    split_item_test_result = split_item_test[1] # 1 captures states
    state = split_item_test_result
    print(state)

    # simple if statement to capture results in a list
    if 'Missouri' in state:
        missouri_list.append(keyword_fmt)
    if 'Illinois' in state:
        illinois_list.append(keyword_fmt)
print(missouri_list)
print(len(missouri_list)) # Counts the number of occurances
print(illinois_list)
print(len(illinois_list)) # Counts the number of occurances
print("All good")

Yes, it's possible with openpyxl .是的,可以使用openpyxl To achieve your real goal try something like this:要实现您的真正目标,请尝试以下操作:

states_and_keywords  = {}
for i in range(4, row + 1):
    cell_object = sheet_object.cell(row=i, column=4)
    split_item_test = cell_object.value.split(",")
    split_item_test_result = split_item_test[1] #note that the element should be 1 for the state
    state = split_item_test_result.strip(" ") #trim whitespace (after comma)
    keyword = cell_object.offset(0,-3).value #this gets the value of the keyword for that row
    if state not in states_and_keywords:
        states_and_keywords[state] = [keyword]
    else:
        states_and_keywords[state].append(keyword) 
print(states_and_keywords)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM