I need to iterate through a pandas df and select only specific rows with a specific value in the first column and then select a value from that row

Question

I have a dataframe that looks like this (it has many more rows and columns but this is how it is set up)

col1   col2     col3           col4  col5  col6  col7  col8
 MSH     a        b             e     e     r     a      d 
 PID     c   6002324^^^WAMT     d     s   PickB   x     
 OBR     e      pickC               PickD   v     z      q
 OBX     g        h             e           s     y       
 ORC     i        j             p     p     p     m      y
  \n   none      none         none  none  none  none   none
 MSH     a        b             e     e     r     a      d 
 PID     c    ^^^WAMT           d     s   PickF   x      o
 OBX     g        h             e     z     s     y      p 
 ORC     i        j                   p     p     m      y
 OBR     e      pickE               PickG   v     z      q
 OBX     g        h             e           s            t
 OBX     i        j             p     p     p     m      t
 OBX     g        h             e           s     y       
 OBX     i        j             p     p     p     m      y
  \n   none     none          none  none  none  none   none
 MSH     a        b             e     e     r     a      d 
 PID     c  43222346^^^WAMT     d     s   PickH   x      e
 OBX     g        h             e     z     s     y      p 
 ORC     i        j                   p     p     m      y
 OBR     e      pickI               PickJ   v     z      q
  \n   none      none         none  none  none  none   none
 MSH     a        b             e     e     r     a      d 
 PID     c    ^^^WAMT           d     s   PickK   x      o
 OBR     e      pickL               PickM   v     z      q
 OBX     g        h             e           s     y

The expect output dataframe would look like this:

col1       col2     col3    col4
^^^WAMT    PickB    PickC   PickD
^^^WAMT    PickK    PickL   PickM

Here is the Data as a DataFrame Constructor:

d = {'col1': ['MSH', 'PID', 'OBR', 'OBX', 'ORC', '/n', 'MSH', 'PID', 'OBX', 'ORC', 'OBR', 'OBX', 'OBX', 'OBX', 'OBX', '\n', 'MSH', 'PID', 'OBX', 'ORC', 'OBR', '\n', 'MSH', 'PID', 'OBR', 'OBX'], 'col2': ['a', 'b', 'e', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'g', 'i', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'none', 'a', 'c', 'e', 'g'], 'col3': ['b', '6002324^^^WAMT', 'pickC', 'h', 'j', 'nine', 'b', '^^^WAMT', 'h', 'j', 'PickE', 'h', 'j', 'h', 'j', 'none', 'b', '43222346^^^WAMT', 'h', 'j', 'PickI', 'none', 'b', '^^^WAMT', 'PickL', 'h'], 'col4': ['e', 'd', '', 'e', 'p', 'none', 'e', 'd', 'e', '', '', 'e', 'p', 'e', 'p', 'none', 'e', 'd', 'e', '', '', 'none', 'e', 'd', '', 'e'], 'col5': ['e', 's', 'PickD', '', 'p', 'none', 'e', 's', 'z', 'p', 'PickG', '', 'p', '', 'p', 'none', 'e', 's', 'z', 'p', 'PickJ', 'none', 'e', 's', 'PickM', ''], 'col6': ['r', 'PickB', 'v', 's', 'p', 'none', 'r', 'PickF', 's', 'p', 'v', 's', 'p', 's', 'p', 'none', 'r', 'PickH', 's', 'p', 'v', 'none', 'r', 'PickK', 'v', 's'], 'col7': ['a', 'b', 'e', '', 'i', 'none', 'a', 'c', 'g', 'i', '', 'g', 'i', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'none', 'a', 'c', 'e', 'g'], 'col8': ['a', 'b', 'e', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'g', 'i', '', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'none', 'a', 'c', 'e', '']}
df = pd.DataFrame(d)

I need to iterate through each row in the df and check if the first column of the df is equal to PID and then i need to check if the field with ^^^WMNT has any numbers ahead of the ^^^WMNT or if it is empty then I want to take out the ^^^WMNT and PickF from PID and PickE and PickG from OBR and out them in a new df. However, if PID column 3 has a number value in front of the ^^^WAMT then I do not want to add the PID or the OBR field to the new df.

So I don't know if it would be easier to just pull out all of the PID and OBR rows and then iterate through them afterwards to do the check and see if PickA has a value or if you could do it all together. I also do not know the best way to iterate through the rows and columns like it would like.

So far I have tried to Iterate through the df using this code but I does not seem to work:

for row, index in range(len(df)):
   if df.loc[df[row] == 'MSH']:
      if df.loc[df[index] == 0]:
         # this is where i would have the pick this column value but I am not sure how to write this

Any help would be appreciated.

Answer 1

You can build a group to split on PID, then use a list comprehension to extract the data and feed it to a DataFrame constructor:

group = df['col1'].eq('PID').cumsum().values

out = pd.DataFrame([
    (g.loc['PID', 'col3'], g.loc['OBR', 'col3'], g.loc['PID', 'col6'], g.loc['OBR', 'col5'])
     for i,g in df.set_index('col1').groupby(group)
     if i and g.loc['PID', 'col3'] == '^^^WAMT'],
    columns=['A', 'B', 'C', 'D']
     )

print(out)

Output:

         A      B      C      D
0  ^^^WAMT  PickE  PickF  PickG
1  ^^^WAMT  PickL  PickK  PickM

I need to iterate through a pandas df and select only specific rows with a specific value in the first column and then select a value from that row

Question

1 answers

solution1
1 ACCPTED 2022-01-25 21:10:57

I need to iterate through a pandas df and select only specific rows with a specific value in the first column and then select a value from that row

Question

1 answers

solution1 1 ACCPTED 2022-01-25 21:10:57

solution1
1 ACCPTED 2022-01-25 21:10:57