简体   繁体   English

如果该列包含另一个数据框的列中的字符串,则在该数据框中创建一个新列

[英]Create a new column in a dataframe if the column contains a string from a column of another dataframe

I want to create a new column in my dataframe if the column contains any of the values from a column of a second dataframe. 如果该列包含第二个数据框的列中的任何值,我想在数据框中创建一个新列。

First dataframe 第一个数据框

WXYnineZAB
EFGsixHIJ
QRSeightTUV
GHItwoJKL
YZAfiveBCD
EFGsixHIJ
MNOthreePQR
ABConeDEF
MNOthreePQR
MNOthreePQR
YZAfiveBCD
WXYnineZAB
GHItwoJKL
KLMsevenNOP
EFGsixHIJ
ABConeDEF
KLMsevenNOP
QRSeightTUV
STUfourVWX
STUfourVWX
KLMsevenNOP
WXYnineZAB
CDEtenFGH
YZAfiveBCD
CDEtenFGH
QRSeightTUV
ABConeDEF
STUfourVWX
CDEtenFGH
GHItwoJKL

Second Dataframe 第二个数据框

one
three
five
seven
nine

Output DataFrame 输出数据框

WXYnineZAB,nine
EFGsixHIJ,***
QRSeightTUV,***
GHItwoJKL,***
YZAfiveBCD,five
EFGsixHIJ,***
MNOthreePQR,three
ABConeDEF,one
MNOthreePQR,three
MNOthreePQR,three
YZAfiveBCD,five
WXYnineZAB,nine
GHItwoJKL,***
KLMsevenNOP,seven
EFGsixHIJ,***
ABConeDEF,one
KLMsevenNOP,seven
QRSeightTUV,***
STUfourVWX,***
STUfourVWX,***
KLMsevenNOP,seven
WXYnineZAB,nine
CDEtenFGH,***
YZAfiveBCD,five
CDEtenFGH,***
QRSeightTUV,***
ABConeDEF,one
STUfourVWX,***
CDEtenFGH,***
GHItwoJKL,***

To explain it easily I made the first dataframe be 3chars + search string + 3chars, but my actual file doesn't have any consistency like this. 为了易于解释,我将第一个数据帧设置为3chars +搜索字符串+ 3chars,但是我的实际文件没有这样的一致性。

Source DFs: 源DF:

In [172]: d1
Out[172]:
            txt
0    WXYnineZAB
1     EFGsixHIJ
2   QRSeightTUV
3     GHItwoJKL
4    YZAfiveBCD
..          ...
25  QRSeightTUV
26    ABConeDEF
27   STUfourVWX
28    CDEtenFGH
29    GHItwoJKL

[30 rows x 1 columns]

In [173]: d2
Out[173]:
    word
0    one
1  three
2   five
3  seven
4   nine

generate RegEx pattern from the second DataFrame: 从第二个DataFrame生成RegEx模式:

In [174]: pat = r'({})'.format(d2['word'].str.cat(sep='|'))

In [175]: pat
Out[175]: '(one|three|five|seven|nine)'

extract words matching the RegEx pattern and assign them as a new column: 提取与RegEx模式匹配的单词并将其分配为新列:

In [176]: d1['new'] = d1['txt'].str.extract(pat, expand=False)

In [177]: d1
Out[177]:
            txt   new
0    WXYnineZAB  nine
1     EFGsixHIJ   NaN
2   QRSeightTUV   NaN
3     GHItwoJKL   NaN
4    YZAfiveBCD  five
..          ...   ...
25  QRSeightTUV   NaN
26    ABConeDEF   one
27   STUfourVWX   NaN
28    CDEtenFGH   NaN
29    GHItwoJKL   NaN

[30 rows x 2 columns]

you can also fill NaN's if you want in the same step: 您也可以在同一步骤中填写NaN:

In [178]: d1['new'] = d1['txt'].str.extract(pat, expand=False).fillna('***')

In [179]: d1
Out[179]:
            txt   new
0    WXYnineZAB  nine
1     EFGsixHIJ   ***
2   QRSeightTUV   ***
3     GHItwoJKL   ***
4    YZAfiveBCD  five
..          ...   ...
25  QRSeightTUV   ***
26    ABConeDEF   one
27   STUfourVWX   ***
28    CDEtenFGH   ***
29    GHItwoJKL   ***

[30 rows x 2 columns]

If you want to avoid RegEx, here is a purely list-based solution: 如果要避免使用RegEx,请使用以下纯粹基于列表的解决方案:

# Sample DataFrames (structure is borrowed from MaxU)
d1 = pd.DataFrame({'txt':['WXYnineZAB','EFGsixHIJ','QRSeightTUV','GHItwoJKL']})
d2 = pd.DataFrame({'word':['two','six']})
# Check if word exists in any txt (1-liner).
exists = [list(d2.word[[word in txt for word in d2.word]])[0] if sum([word in txt for word in d2.word]) == 1 else '***' for txt in d1.txt]
# Resulting output
res = pd.DataFrame(zip(d1.txt,exists), columns = ['text','word'])

Result: 结果:

          text word
0   WXYnineZAB  ***
1    EFGsixHIJ  six
2  QRSeightTUV  ***
3    GHItwoJKL  two

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果 DataFrame 包含特定字符串,则创建新列 - Create new column if DataFrame contains specific string 如果来自另一个 dataframe 的列和来自原始 dataframe 的列具有匹配值,则在原始 dataframe 中创建一个新列 - Create a new column in the original dataframe if the column from another dataframe and a column from original dataframe have matching values Python Pandas 数据框创建一个新列,其中包含从另一列中减去的值 - Python Pandas dataframe create a new column which contains the subtraction from another column Python:在数据帧中,创建一个新列,并使用从另一列的值中切出的字符串 - Python: In a dataframe, create a new column with a string sliced from a column with the value of another column 从数据框的两列创建一个新列,其中每列的行包含字符串格式的列表 - Create a new column from two columns of a dataframe where rows of each column contains list in string format 如何创建一个新的数据框列,并从另一个列中移出值? - How to create a new dataframe column with shifted values from another column? 创建新的数据框列,保留另一列的第一个值 - Create new dataframe column keeping the first value from another column 使用来自另一个数据帧的 if 条件在 Pandas 数据帧中创建一个新列 - create a new column in pandas dataframe using if condition from another dataframe 根据来自另一个数据框的 2 个条件创建新的数据框列 - Create new dataframe column based on 2 criteria from another dataframe 如何从包含 json 的文件创建新的 pandas dataframe 列? - How to create new pandas dataframe column from file that contains json?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM