简体   繁体   中英

How to write the values to another column of dataframe based on Row_id column and value exist in match column?

I have a dataframe like this which is having ROW_ID and Matches columns. Based on the value in each row of Matches column I should write in Result column. For example, in first row, we have ; ALL MATCH -3 , so in the new column Result , this ; ALL MATCH should be present in ROW_ID 3. In 8th ROW_ID , we have ; ALL MATCH -9; Diff in# -10 . So in our Result column ; ALL MATCH should be present in ROW_ID 9 and ; Diff in# should be present in ROW_ID 10

ROW_ID Matches
1 ; ALL MATCH -3
2
3
4
5 ; ALL MATCH -6
6
7
8 ; ALL MATCH -9; Diff in# -10
9
10

That means the final dataframe should be like this.

ROW_ID Result
1
2
3 ; ALL MATCH
4
5
6 ; ALL MATCH
7
8
9 ; ALL MATCH
10 ; Diff in#

I tried a lot, I extracted the int value seperately and other parts separately for each row using dataframe.iterrows(). But I am not able to print that value to a particular position. df.at[] method won't work. loc and iloc also i tried, but not getting how can i print that string to particular row of that column.

Try:

df['Result'] = df['ROW_ID'].map(
    df['Matches'].str.extractall('(; [^-]+) -(\d+)')
                 .astype({1: int}).set_index(1).squeeze()
).fillna('')

Output:

>>> df
   ROW_ID                       Matches       Result
0       1                ; ALL MATCH -3             
1       2                                           
2       3                                ; ALL MATCH
3       4                                           
4       5                ; ALL MATCH -6             
5       6                                ; ALL MATCH
6       7                                           
7       8  ; ALL MATCH -9; Diff in# -10             
8       9                                ; ALL MATCH
9      10                                 ; Diff in#

# Details about extractall
>>> df['Matches'].str.extractall('(; [^-]+) -(\d+)')
                   0   1
  match                 
0 0      ; ALL MATCH   3
4 0      ; ALL MATCH   6
7 0      ; ALL MATCH   9
  1       ; Diff in#  10

Create a temporary DataFrame as:

wrk = df.Matches.str.extractall(r'(?P<Result>;\D+)-(?P<id>\d+)')

Then strip the trailing spaces from Result column:

wrk.Result = wrk.Result.str.strip()

The next step is to change the type of id column to int , as so far it is of object type (actually a string ):

wrk.id = wrk.id.astype('int64')

and set it as the index:

wrk.set_index('id', inplace=True)

Now wrk is actually a Series , containing:

         Result
id             
3   ; ALL MATCH
6   ; ALL MATCH
9   ; ALL MATCH
10   ; Diff in#

Then, to generate the result, run:

res = df.merge(wrk, how='left', left_on='ROW_ID', right_index=True)

The result is:

   ROW_ID                       Matches       Result
0       1                ; ALL MATCH -3          NaN
1       2                           NaN          NaN
2       3                           NaN  ; ALL MATCH
3       4                           NaN          NaN
4       5                ; ALL MATCH -6          NaN
5       6                           NaN  ; ALL MATCH
6       7                           NaN          NaN
7       8  ; ALL MATCH -9; Diff in# -10          NaN
8       9                           NaN  ; ALL MATCH
9      10                           NaN   ; Diff in#

If you don't want "NaN" in "not filled" fields, append .fillna('') to the last instruction.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM