[英]How to write the values to another column of dataframe based on Row_id column and value exist in match column?
I have a dataframe like this which is having ROW_ID and Matches columns.我有一个像这样的 dataframe ,它有ROW_ID和Matches列。 Based on the value in each row of Matches column I should write in Result column.根据Matches列中每一行的值,我应该在Result列中写入。 For example, in first row, we have ;例如,在第一行,我们有; ALL MATCH -3 , so in the new column Result , this ; ALL MATCH -3 ,所以在新列Result中,这个; ALL MATCH should be present in ROW_ID 3. In 8th ROW_ID , we have ; ALL MATCH应该出现在ROW_ID 3 中。在第 8个 ROW_ID中,我们有; ALL MATCH -9;全场比赛-9; Diff in# -10 .差异# -10 。 So in our Result column ;所以在我们的结果列中; ALL MATCH should be present in ROW_ID 9 and ; ALL MATCH应该出现在ROW_ID 9 和; Diff in# should be present in ROW_ID 10 Diff in#应该出现在ROW_ID 10 中
ROW_ID ROW_ID | Matches火柴 |
---|---|
1 1 | ; ; ALL MATCH -3所有比赛 -3 |
2 2 | |
3 3 | |
4 4 | |
5 5 | ; ; ALL MATCH -6所有比赛 -6 |
6 6 | |
7 7 | |
8 8 | ; ; ALL MATCH -9;全场比赛-9; Diff in# -10差异# -10 |
9 9 | |
10 10 |
That means the final dataframe should be like this.这意味着最终的 dataframe 应该是这样的。
ROW_ID ROW_ID | Result结果 |
---|---|
1 1 | |
2 2 | |
3 3 | ; ; ALL MATCH所有比赛 |
4 4 | |
5 5 | |
6 6 | ; ; ALL MATCH所有比赛 |
7 7 | |
8 8 | |
9 9 | ; ; ALL MATCH所有比赛 |
10 10 | ; ; Diff in#差异# |
I tried a lot, I extracted the int value seperately and other parts separately for each row using dataframe.iterrows().我尝试了很多,我使用 dataframe.iterrows() 分别提取了每一行的 int 值和其他部分。 But I am not able to print that value to a particular position.但我无法将该值打印到特定的 position。 df.at[] method won't work. df.at[] 方法不起作用。 loc and iloc also i tried, but not getting how can i print that string to particular row of that column. loc 和 iloc 我也尝试过,但没有得到如何将该字符串打印到该列的特定行。
Try:尝试:
df['Result'] = df['ROW_ID'].map(
df['Matches'].str.extractall('(; [^-]+) -(\d+)')
.astype({1: int}).set_index(1).squeeze()
).fillna('')
Output: Output:
>>> df
ROW_ID Matches Result
0 1 ; ALL MATCH -3
1 2
2 3 ; ALL MATCH
3 4
4 5 ; ALL MATCH -6
5 6 ; ALL MATCH
6 7
7 8 ; ALL MATCH -9; Diff in# -10
8 9 ; ALL MATCH
9 10 ; Diff in#
# Details about extractall
>>> df['Matches'].str.extractall('(; [^-]+) -(\d+)')
0 1
match
0 0 ; ALL MATCH 3
4 0 ; ALL MATCH 6
7 0 ; ALL MATCH 9
1 ; Diff in# 10
Create a temporary DataFrame as:创建一个临时 DataFrame 为:
wrk = df.Matches.str.extractall(r'(?P<Result>;\D+)-(?P<id>\d+)')
Then strip the trailing spaces from Result column:然后从Result列中去除尾随空格:
wrk.Result = wrk.Result.str.strip()
The next step is to change the type of id column to int , as so far it is of object type (actually a string ):下一步是将id列的类型更改为int ,到目前为止它是object类型(实际上是string ):
wrk.id = wrk.id.astype('int64')
and set it as the index:并将其设置为索引:
wrk.set_index('id', inplace=True)
Now wrk is actually a Series , containing:现在wrk实际上是一个Series ,包含:
Result
id
3 ; ALL MATCH
6 ; ALL MATCH
9 ; ALL MATCH
10 ; Diff in#
Then, to generate the result, run:然后,要生成结果,请运行:
res = df.merge(wrk, how='left', left_on='ROW_ID', right_index=True)
The result is:结果是:
ROW_ID Matches Result
0 1 ; ALL MATCH -3 NaN
1 2 NaN NaN
2 3 NaN ; ALL MATCH
3 4 NaN NaN
4 5 ; ALL MATCH -6 NaN
5 6 NaN ; ALL MATCH
6 7 NaN NaN
7 8 ; ALL MATCH -9; Diff in# -10 NaN
8 9 NaN ; ALL MATCH
9 10 NaN ; Diff in#
If you don't want "NaN" in "not filled" fields, append .fillna('')
to the last instruction.如果您不想在“未填充”字段中出现“NaN”,则 append .fillna('')
到最后一条指令。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.