[英]Replacing part of a string in pandas column with `and` condition
I have a pandas dataframe
that looks like this: 我有一个看起来像这样的pandas dataframe
:
Size Measure Location Messages
Small 1 Washington TXT0123 TXT0875 TXT874 TXT0867 TXT0875 TXT0874
Medium 2 California TXT020 TXT017 TXT120 TXT012
Large 3 Texas TXT0123 TXT0123 TXT0123 TXT0123 TXT0217 TXT0206
Small 4 California TXT020 TXT0217 TXT006
Tiny 5 Nevada TXT0206 TXT0217 TXT0206
I am trying to remove the 0 from the individual words in the Messages
column if the length equals 7 and the fourth character is 0. 如果长度等于7并且第四个字符为0,我试图从“ Messages
列中的各个单词中删除0。
I've tried for loop, but it's removing all 0's: 我试过for循环,但它删除了所有0:
for line in df.Messages:
for message in line.split():
if len(message) == 7 and message[3] == '0':
print(message.replace('0', ''))
I also tried .map
which gave me some errors: 我还尝试了.map
,这给了我一些错误:
df.Messages = df.Messages.map(lambda x: x.replace('0', '') for message in line.split() for line in df.Messages if (len(message) == 7 and message[3] == '0'))
TypeError: 'generator' object is not callable
Is there a way to do this with .map
that includes the if
and and
conditionals? 有没有办法使用包含if
和and
条件的.map
来做到这一点?
Given you want to do this for each word, first split your column with str.split
, call apply
, and then re-join with str.join
: 如果您想对每个单词执行此操作,请先使用str.split
拆分列,调用apply
,然后使用str.join
重新加入:
def f(l):
return [w.replace('0', '') if len(w) == 7 and w[3] == '0' else w for w in l]
df.Messages.str.split().apply(f).str.join(' ')
0 TXT123 TXT875 TXT874 TXT867 TXT875 TXT874
1 TXT020 TXT017 TXT120 TXT012
2 TXT123 TXT123 TXT123 TXT123 TXT217 TXT26
3 TXT020 TXT217 TXT006
4 TXT26 TXT217 TXT26
Name: Messages, dtype: object
If you want to replace just the single 0 (and not all of them), use w.replace('0', '', 1)
in function f
instead. 如果只想替换单个0(而不是全部),请在函数f
使用w.replace('0', '', 1)
。
df.Messages.str.split().apply(pd.Series).fillna('').\
applymap(lambda x : x[:2]+x[4:] if len(x)==7 and x[3]=='0' else x).\
apply(' '.join,1)
Out[597]: 出[597]:
0 TX123 TX875 TXT874 TX867 TX875 TX874
1 TXT020 TXT017 TXT120 TXT012
2 TX123 TX123 TX123 TX123 TX217 TX206
3 TXT020 TX217 TXT006
4 TX206 TX217 TX206
dtype: object
IIUC: IIUC:
In [17]: df['Messages'] = df['Messages'].str.replace(r'(\D+)0(\d{3})',r'\1\2')
In [18]: df
Out[18]:
Size Measure Location Messages
0 Small 1 Washington TXT123 TXT875 TXT874 TXT867 TXT875 TXT874
1 Medium 2 California TXT020 TXT017 TXT120 TXT012
2 Large 3 Texas TXT123 TXT123 TXT123 TXT123 TXT217 TXT206
3 Small 4 California TXT020 TXT217 TXT006
4 Tiny 5 Nevada TXT206 TXT217 TXT206
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.