简体   繁体   English

复制包含行上的字符串并将特定结果移动到新列 Pandas python

[英]Replicate contains string on row and move with specific result to new column Pandas python

I need to move a specific selected value of row to a new column with selected only value, what method that I need to do this?我需要将行的特定选定值移动到仅具有选定值的新列,我需要什么方法来做到这一点? For example, I need to get only the value chosen that contains 'KOTA|KAB' to a new column, from this case, what例如,我只需要将包含'KOTA|KAB'的所选值添加到新列中,从这种情况下,什么

I have some data:我有一些数据:

+---------------------------------------------+
|                     A                       |
+---------------------------------------------+
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   |
+---------------------------------------------+

I need to get specific value only and move to a new column, what I expect is look like this我只需要获取特定值并移至新列,我期望的是这样

+---------------------------------------------+----------------------+
|                     A                       |           B          |
+---------------------------------------------+----------------------+
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 | KOTA SURABAYA        |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      | KAB SIDOARJO         |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   | KAB BANYUWANGI       |
+---------------------------------------------+----------------------+

What I tried:我尝试了什么:

# initialize list of lists
testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'], 
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'], 
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]
  
# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])
  
for check in df_test['A']:
    test = re.sub(r'(\bKOTA\b)|(\bKAB\b)', '', check)
    print(test)

But the result above will remove the KOTA and KAB但上面的结果将删除 KOTA 和 KAB

Assuming you want to extract KOTA/KAB and the following words (except digits), you can use:假设您要提取 KOTA/KAB 和以下单词(数字除外),您可以使用:

df_test['B'] = df_test['A'].str.extract(r'(\b(?:KOTA|KAB)\b\D+\b)')

output:输出:

                                             A                B
0  JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135   KOTA SURABAYA 
1       JL XXXX, KEC PORONG KAB SIDOARJO 61274    KAB SIDOARJO 
2    DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471  KAB BANYUWANGI 

Using re.sub removed the text that is matched with the pattern from the string.使用re.sub从字符串中删除了与模式匹配的文本。

You can use a single capture group with str.extract for example:您可以将单个捕获组与str.extract一起使用,例如:

testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'],
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'],
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]

# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])

df_test['B'] = df_test["A"].str.extract(r'\b((?:KOTA|KAB) \w+)')
print (df_test)

Output输出

                                             A               B
0  JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135   KOTA SURABAYA
1       JL XXXX, KEC PORONG KAB SIDOARJO 61274    KAB SIDOARJO
2    DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471  KAB BANYUWANGI

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM