复制包含行上的字符串并将特定结果移动到新列 Pandas python

Question

I need to move a specific selected value of row to a new column with selected only value, what method that I need to do this?我需要将行的特定选定值移动到仅具有选定值的新列，我需要什么方法来做到这一点？ For example, I need to get only the value chosen that contains 'KOTA|KAB' to a new column, from this case, what例如，我只需要将包含'KOTA|KAB'的所选值添加到新列中，从这种情况下，什么

I have some data:我有一些数据：

+---------------------------------------------+
|                     A                       |
+---------------------------------------------+
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   |
+---------------------------------------------+

I need to get specific value only and move to a new column, what I expect is look like this我只需要获取特定值并移至新列，我期望的是这样

+---------------------------------------------+----------------------+
|                     A                       |           B          |
+---------------------------------------------+----------------------+
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 | KOTA SURABAYA        |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      | KAB SIDOARJO         |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   | KAB BANYUWANGI       |
+---------------------------------------------+----------------------+

What I tried:我尝试了什么：

# initialize list of lists
testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'], 
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'], 
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]
  
# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])
  
for check in df_test['A']:
    test = re.sub(r'(\bKOTA\b)|(\bKAB\b)', '', check)
    print(test)

But the result above will remove the KOTA and KAB但上面的结果将删除 KOTA 和 KAB

Answer 1

Assuming you want to extract KOTA/KAB and the following words (except digits), you can use:假设您要提取 KOTA/KAB 和以下单词（数字除外），您可以使用：

df_test['B'] = df_test['A'].str.extract(r'(\b(?:KOTA|KAB)\b\D+\b)')

output:输出：

                                             A                B
0  JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135   KOTA SURABAYA 
1       JL XXXX, KEC PORONG KAB SIDOARJO 61274    KAB SIDOARJO 
2    DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471  KAB BANYUWANGI

Answer 2

Using re.sub removed the text that is matched with the pattern from the string.使用re.sub从字符串中删除了与模式匹配的文本。

You can use a single capture group with str.extract for example:您可以将单个捕获组与str.extract一起使用，例如：

testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'],
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'],
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]

# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])

df_test['B'] = df_test["A"].str.extract(r'\b((?:KOTA|KAB) \w+)')
print (df_test)

Output输出

                                             A               B
0  JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135   KOTA SURABAYA
1       JL XXXX, KEC PORONG KAB SIDOARJO 61274    KAB SIDOARJO
2    DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471  KAB BANYUWANGI

复制包含行上的字符串并将特定结果移动到新列 Pandas python

问题描述

2 个解决方案

解决方案1
2 2022-07-08 13:23:56

解决方案2
2 已采纳 2022-07-08 13:24:04

复制包含行上的字符串并将特定结果移动到新列 Pandas python

问题描述

2 个解决方案

解决方案1 2 2022-07-08 13:23:56

解决方案2 2 已采纳 2022-07-08 13:24:04

解决方案1
2 2022-07-08 13:23:56

解决方案2
2 已采纳 2022-07-08 13:24:04