简体   繁体   English

如何根据 Row_id 列将值写入 dataframe 的另一列并且匹配列中存在值?

[英]How to write the values to another column of dataframe based on Row_id column and value exist in match column?

I have a dataframe like this which is having ROW_ID and Matches columns.我有一个像这样的 dataframe ,它有ROW_IDMatches列。 Based on the value in each row of Matches column I should write in Result column.根据Matches列中每一行的值,我应该在Result列中写入。 For example, in first row, we have ;例如,在第一行,我们有; ALL MATCH -3 , so in the new column Result , this ; ALL MATCH -3 ,所以在新列Result中,这个 ALL MATCH should be present in ROW_ID 3. In 8th ROW_ID , we have ; ALL MATCH应该出现在ROW_ID 3 中。在第 8个 ROW_ID中,我们有; ALL MATCH -9;全场比赛-9; Diff in# -10 .差异# -10 So in our Result column ;所以在我们的结果列中 ALL MATCH should be present in ROW_ID 9 and ; ALL MATCH应该出现在ROW_ID 9 和; Diff in# should be present in ROW_ID 10 Diff in#应该出现在ROW_ID 10 中

ROW_ID ROW_ID Matches火柴
1 1 ; ; ALL MATCH -3所有比赛 -3
2 2
3 3
4 4
5 5 ; ; ALL MATCH -6所有比赛 -6
6 6
7 7
8 8 ; ; ALL MATCH -9;全场比赛-9; Diff in# -10差异# -10
9 9
10 10

That means the final dataframe should be like this.这意味着最终的 dataframe 应该是这样的。

ROW_ID ROW_ID Result结果
1 1
2 2
3 3 ; ; ALL MATCH所有比赛
4 4
5 5
6 6 ; ; ALL MATCH所有比赛
7 7
8 8
9 9 ; ; ALL MATCH所有比赛
10 10 ; ; Diff in#差异#

I tried a lot, I extracted the int value seperately and other parts separately for each row using dataframe.iterrows().我尝试了很多,我使用 dataframe.iterrows() 分别提取了每一行的 int 值和其他部分。 But I am not able to print that value to a particular position.但我无法将该值打印到特定的 position。 df.at[] method won't work. df.at[] 方法不起作用。 loc and iloc also i tried, but not getting how can i print that string to particular row of that column. loc 和 iloc 我也尝试过,但没有得到如何将该字符串打印到该列的特定行。

Try:尝试:

df['Result'] = df['ROW_ID'].map(
    df['Matches'].str.extractall('(; [^-]+) -(\d+)')
                 .astype({1: int}).set_index(1).squeeze()
).fillna('')

Output: Output:

>>> df
   ROW_ID                       Matches       Result
0       1                ; ALL MATCH -3             
1       2                                           
2       3                                ; ALL MATCH
3       4                                           
4       5                ; ALL MATCH -6             
5       6                                ; ALL MATCH
6       7                                           
7       8  ; ALL MATCH -9; Diff in# -10             
8       9                                ; ALL MATCH
9      10                                 ; Diff in#

# Details about extractall
>>> df['Matches'].str.extractall('(; [^-]+) -(\d+)')
                   0   1
  match                 
0 0      ; ALL MATCH   3
4 0      ; ALL MATCH   6
7 0      ; ALL MATCH   9
  1       ; Diff in#  10

Create a temporary DataFrame as:创建一个临时 DataFrame 为:

wrk = df.Matches.str.extractall(r'(?P<Result>;\D+)-(?P<id>\d+)')

Then strip the trailing spaces from Result column:然后从Result列中去除尾随空格:

wrk.Result = wrk.Result.str.strip()

The next step is to change the type of id column to int , as so far it is of object type (actually a string ):下一步是将id列的类型更改为int ,到目前为止它是object类型(实际上是string ):

wrk.id = wrk.id.astype('int64')

and set it as the index:并将其设置为索引:

wrk.set_index('id', inplace=True)

Now wrk is actually a Series , containing:现在wrk实际上是一个Series ,包含:

         Result
id             
3   ; ALL MATCH
6   ; ALL MATCH
9   ; ALL MATCH
10   ; Diff in#

Then, to generate the result, run:然后,要生成结果,请运行:

res = df.merge(wrk, how='left', left_on='ROW_ID', right_index=True)

The result is:结果是:

   ROW_ID                       Matches       Result
0       1                ; ALL MATCH -3          NaN
1       2                           NaN          NaN
2       3                           NaN  ; ALL MATCH
3       4                           NaN          NaN
4       5                ; ALL MATCH -6          NaN
5       6                           NaN  ; ALL MATCH
6       7                           NaN          NaN
7       8  ; ALL MATCH -9; Diff in# -10          NaN
8       9                           NaN  ; ALL MATCH
9      10                           NaN   ; Diff in#

If you don't want "NaN" in "not filled" fields, append .fillna('') to the last instruction.如果您不想在“未填充”字段中出现“NaN”,则 append .fillna('')到最后一条指令。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为 PySpark 数据帧添加具有唯一 row_id 的列,该列从先前代码运行的 max(row_id) +1 开始 - How to add a column with unique row_id for a PySpark dataframe that start the row_id from the max(row_id) +1 from previous code run 如何根据另一个 dataframe 的匹配为 dataframe 的新列添加值? - how to add value to a new column to a dataframe based on the match of another dataframe? Python 根据另一个 dataframe 中的列值匹配列名 - Python match a column name based on a column value in another dataframe 如何根据行中的另一个值在 dataframe 中创建列(Python) - How to create a column in a dataframe based on another value in the row (Python) 使用基于 ID 列的另一行的值来估算 Pandas 数据框列 - Impute Pandas dataframe column with value from another row based on ID column 检查每个列值是否存在于另一个 dataframe 列中,其中另一个列值是列 header - Check if each column values exist in another dataframe column where another column value is the column header 如何根据DataFrame中的另一列查找列的两行是否存在? - how to find if two rows of a column exist based on another column in a DataFrame? 如何根据另一个数据框的值返回列中的值 - How to return a value in a column based on another's dataframe's values 如何添加基于每一行的不同值的列,以从另一个数据框中进行excel类型“ INDEX,MATCH,MATCH”搜索? - How to add a column that based on the different value for every row conducts excel type “INDEX, MATCH, MATCH” search from another dataframe? 如何将列值与每行多个值的另一个数据框中的行值匹配? - How would I match column values to row values in another dataframe with multiple values per row?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM