簡體   English   中英

我有兩個數據集,需要將一個數據集列中的字符串與R中的其他數據集列進行比較

[英]I have two dataset and need to comapre string from one data set columns with other dataset column in R

我有兩個數據集,需要將一個數據集列中的字符串與R中的另一個數據集列進行比較:

以下是詳細信息。 大小寫可以忽略

誰能幫我這個忙。

第一個數據集:

 <table><tbody><tr><th>instancename</th><th>hostname</th><th>sid</th><th> </th></tr><tr><td>instance1</td><td>server1</td><td> </td><td>sid1</td></tr><tr><td>instance2</td><td>server2</td><td> </td><td>sid2</td></tr><tr><td>instance3</td><td>server3</td><td> </td><td>sid3</td></tr><tr><td>instance4</td><td>server4</td><td> </td><td>sid4</td></tr><tr><td>instance5</td><td>server5</td><td> </td><td>sid5</td></tr><tr><td>instance6</td><td>server6</td><td> </td><td>sid6</td></tr></tbody> 

第二個數據集:

 <table><tbody><tr><th>short_description</th><th>description</th></tr><tr><td>Kindly activate Server1 information</td><td>Kindly activate all sid3 and there is issue with instance3</td></tr><tr><td>server2: issue on instance2</td><td>find a sloution for this issue</td></tr><tr><td>Please fix the issue</td><td>issue is on Sid6</td></tr><tr><td>can you please check instance5 on server5</td><td>Sid5. Please look into this issue asap.</td></tr><tr><td>sid1: performance issue</td><td>server1 and sid1. Performance issue</td></tr><tr><td>Can you please check the issue</td><td>Can you please check the issues</td></tr></tbody></table> 

我需要像下面這樣的最終數據集

 <table><tbody><tr><th>short_description</th><th>description</th><th>Final_output</th></tr><tr><td>Kindly activate Server1 information</td><td>Kindly activate all sid3 and there is issue with instance3</td><td>Server1,sid3,instance3</td></tr><tr><td>server2: issue on instance2</td><td>find a sloution for this issue</td><td>server2,instance2</td></tr><tr><td>Please fix the issue</td><td>issue is on Sid6</td><td>Sid6</td></tr><tr><td>can you please check instance5 on server5</td><td>Sid5. Please look into this issue asap.</td><td>server5,Sid5</td></tr><tr><td>sid1: performance issue</td><td>server1 and sid1. Performance issue</td><td>sid1,server1</td></tr><tr><td>Can you please check the issue</td><td>Can you please check the issues</td><td>no matches found</td></tr></tbody></table> 

由於您以html格式提供了數據,因此我將不得不將其讀入r成為表:

b ="<table><tbody><tr><th>short_description</th><th>description</th></tr><tr><td>Kindly activate Server1 information</td><td>Kindly activate all sid3 and there is issue with instance3</td></tr><tr><td>server2: issue on instance2</td><td>find a sloution for this issue</td></tr><tr><td>Please fix the issue</td><td>issue is on Sid6</td></tr><tr><td>can you please check instance5 on server5</td><td>Sid5. Please look into this issue asap.</td></tr><tr><td>sid1: performance issue</td><td>server1 and sid1. Performance issue</td></tr><tr><td>Can you please check the issue</td><td>Can you please check the issues</td></tr></tbody></table>"


dat2= xml2::as_xml_document(paste0("<body>",b,"</body>"))%>%
  rvest::html_table()%>%
  {.[[1]]}


serv_instance = gsub("(?|.*?((?i)server\\d+|instance\\d+|sid\\d+)|.+)","\\1",do.call(paste,dat2),perl=T)

final_output = replace(gsub("(?<=\\d)(?=[A-Za-z])",", ",serv_instance,perl=T),!nchar(serv_instance),"No match found")

cbind(dat2,final_output)

                          short_description                                                description             final_output
1       Kindly activate Server1 information Kindly activate all sid3 and there is issue with instance3 Server1, sid3, instance3
2               server2: issue on instance2                             find a sloution for this issue       server2, instance2
3                      Please fix the issue                                           issue is on Sid6                     Sid6
4 can you please check instance5 on server5                    Sid5. Please look into this issue asap. instance5, server5, Sid5
5                   sid1: performance issue                        server1 and sid1. Performance issue      sid1, server1, sid1
6            Can you please check the issue                            Can you please check the issues           No match found

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM