简体   繁体   中英

looping through a list and inserting into str.contains (and counting rows of a df where multiple items are present) using python

My goal is to design a function that will take two arguments - one a list of people playing poker, the next a list of possible actions (eg call, raise) - and use str.contains on a column to find out how often each player does each action.

the DataFrame df has a few columns, but I want to apply the function just to the column titled "entry" which consists of a log of all actions that took place at an online poker table (each row in the column is a string).

This is what the column "entry" looks like (each line is a string):

-- ending hand #174 --
"Prof @ ZY_G_5ZOve" gained 100
"tom_thumb @ g1PBaozt7k" folds
"Prof @ ZY_G_5ZOve" calls with 50
"tom_thumb @ g1PBaozt7k" checks
river: 9♦, 5♣, Q♥, 7♠ [K♠]
"Prof @ ZY_G_5ZOve" checks
"tom_thumb @ g1PBaozt7k" checks
turn: 9♦, 5♣, Q♥ [7♠]
"Prof @ ZY_G_5ZOve" checks
"tom_thumb @ g1PBaozt7k" checks
flop:  [9♦, 5♣, Q♥]
"Prof @ ZY_G_5ZOve" checks
"tom_thumb @ g1PBaozt7k" calls with 50
"Bob T. @ fjZTXUGV2G" folds
"danny G @ tNE1_lEFYv" folds
"Prof @ ZY_G_5ZOve" posts a big blind of 50
"tom_thumb @ g1PBaozt7k" posts a small blind of 25
-- starting hand #174  (Texas Hold'em) (dealer: "Bob T. @ fjZTXUGV2G") --
-- ending hand #173 --
"tom_thumb @ g1PBaozt7k" gained 475
"danny G @ tNE1_lEFYv" folds
"Prof @ ZY_G_5ZOve" folds
"tom_thumb @ g1PBaozt7k" raises with 356
flop:  [4♥, A♠, 6♠]
"danny G @ tNE1_lEFYv" calls with 150
"Prof @ ZY_G_5ZOve" calls with 150
"tom_thumb @ g1PBaozt7k" raises with 150
"Bob T. @ fjZTXUGV2G" folds
"danny G @ tNE1_lEFYv" calls with 50
"Prof @ ZY_G_5ZOve" calls with 50
"tom_thumb @ g1PBaozt7k" posts a big blind of 50
"Bob T. @ fjZTXUGV2G" posts a small blind of 25
-- starting hand #173  (Texas Hold'em) (dealer: "danny G @ tNE1_lEFYv") --

Here is some sample code I have tried:

player_list = ['danny G', 'Jane', 'Prof', 'spn', 'tim', 'Bob T.', 'joon', 'tom_thumb']
action_list = ['call', 'fold']

def action_amount(df, player_list, action):
    for player in player_list:
        action_number =len(df[df['entry'].str.contains('(player).*(action)', regex=True)])
        print(f'{player} {action}ed {action_number} times.')

action_amount(df, player_list, 'call')

Right now, the formatting is right, but I can't loop items in the list to str.contains, so this is the result:

danny G called 0 times.
Jane called 0 times.
Prof called 0 times.
spn called 0 times.
tim called 0 times.
Bob T. called 0 times.
joon called 0 times.
tom_thumb called 0 times.

For the sample df['entry'] information above, it should return:

danny G called 2 times.
Jane called 0 times.
Prof called 3 times.
spn called 0 times.
tim called 0 times.
Bob T. called 0 times.
joon called 0 times.
tom_thumb called 1 times.

Notably, len(df[df['entry'].str.contains('(danny G).*(call)', regex=True)]) returns the correct value (I am using regex because the two words I am looking for are in the same line with a bunch of different characters in between).

The issue seems related to trying to loop values into the string pattern of str.contains . How do I loop through the list and get the names printed along with the number of times the person performed a given entered action?

Ideally, I would want to loop through both lists at the top of the code at the same time.

Would this work?

def action_amount(df, player_list, action_list):
    for player in player_list:
        for action in action_list:
            pattern = f'{player}.*{action}'
            matching_rows = df[df['entry'].str.contains(pattern, regex=True)]
            action_number = len(matching_rows)
            print(f'{player} {action}ed {action_number} times.')

action_amount(df, player_list, possible_actions)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM