How To Clean Data Removing Tabs In Sas
As in many other programming languages, there is a very useful SAS role that removes leading blanks in character strings. It is the ubiquitous LEFT function.
The LEFT(x) function left-aligns a grapheme cord x, which effectively removes leading blanks.
Removing trailing characters from SAS strings
Notwithstanding, in many SAS applications we need a like but more than versatile data cleansing functionality assuasive for removal of other leading characters, non simply blanks. For example, consider some bank account numbers that are stored every bit the following character strings:
123456789
0123456789
000123456789
These strings represent the same business relationship number recorded with either no, one, or several leading zeros. One way of standardizing this data is by removing the leading 0's. And while we're at it, why don't we accost the leading character removal functionality for any leading characters, not only zeros.
How to remove any leading characters
For example, let's remove all occurrences of the arbitrary leading character '*'. The post-obit diagram illustrates what we are going to achieve and how:
In order to remove a specified grapheme (in this example '*') from all leading positions in a string, nosotros demand to search our string from left to correct and find the position of the first character in that string that is not equal to the specified character. In this case, information technology's a blank grapheme in position iv. Then we can extract a substring starting from that position till the end of the string.
I can run into 2 possible solutions.
Solution ane: Using VERIFY() function
The VERIFY (X, C) part searches string Ten from left to right and returns the position P of the first grapheme that does not appear in the value of C.
Then we can apply the SUBSTR(X,P) function that extracts a substring of X starting from position P till the end of the string X.
Solution 2: Using FINDC() function
The FINDC(10, C, 'Thou') part also searches string X from left to correct and returns the position P of the first character that does not announced in C. (The modifier 'M' switches the default behavior of searching for any graphic symbol that appears in C to searching for whatever character that does non appear in C.)
Then, equally with the VERIFY() function, we can employ the SUBSTR(X,P) function that extracts a substring of Ten starting from position P till the end of the string X.
Special considerations
So far so good, and everything volition be but hunky-dory, right? Not really - unless we cover our bases by handling edge cases.
Have we thought of what would happen if our string X consisted of all '*' characters and nothing else? In this special case, both the verify() office and findc() function will notice no position of the character that is non equal to '*' and thus return 0.
However, 0 is non a valid 2d argument value for the SUBSTR(X,P) function. Valid values are 1 . . . through vlength(X) - the length attribute of variable 10. Having a 0 value for the second argument will trigger the automatic information step variable _ERROR_=one and the following note generated in the SAS log:
NOTE: Invalid 2d argument to function SUBSTR at line ## cavalcade #.
Therefore, we need to handle this special case separately, conditionally using SUBSTR(Ten,P) for P>0 and assigning blank ('') otherwise.
Code implementation for removing leading characters
Permit's put everything together. Showtime, nosotros'll create a exam information table:
data TEST; input X $ 1-xx; datalines; *** Information technology's washed* ********* **01234*ABC** No leading *'s ;
Then we utilise the logic described above. The following DATA step illustrates our 2 implemented coding solutions for removing leading characters:
information Clean ( keep=10 Y Z); set Exam; C = '*'; *<- leading character(s) to be removed; P1 = verify ( Ten,C); *<- Solution i; if P1 and then Y = substr ( Ten, P1); else Y = ''; P2 = findc( X,C,'M' ); *<- Solution 2; if P2 then Z = substr ( X, P2); else Z = ''; put _n_= / X= / P1= / Y= / P2= / Z= /; run;
The SAS log will prove acting and terminal results by the Data step iterations:
_N_=1 X=*** It's done* P1=four Y=Information technology's washed* P2=4 Z=Information technology'south done* _N_=2 X=********* P1=10 Y= P2=10 Z= _N_=3 X=**01234*ABC** P1=three Y=01234*ABC** P2=3 Z=01234*ABC** _N_=4 X=No leading *'due south P1=1 Y=No leading *'s P2=1 Z=No leading *'s
Here is the output data table CLEAN showing the original string X, and resulting strings Y (solution i) and Z (solution two) side by side:
Equally you can run across, both solutions (ane & 2) produce identical results.
Determination
Compared to the LEFT() role, the solution presented in this blog mail service not only expands leading grapheme removal/cleansing functionality beyond the bare character exclusively. Using this coding technique nosotros can simultaneously remove a multifariousness of leading characters (including but not limited to blank). For example, if nosotros accept a cord X=' 0.000 12345' and specify C = ' 0.' (the social club of characters listed within the value of C does not matter), and then all three characters ' ', '0', and '.' will exist removed from all leading positions of X. The resulting cord will be Y='12345'.
Additional resources
- Deleting a substring from a SAS string
- Inserting a substring into a SAS string
- Removing repeated characters in SAS strings
- Finding north-th instance of a substring inside a string
Questions? Thoughts? Comments?
Exercise you find this post useful? Do you take questions, concerns, comments? Please share with the states beneath.
Removing trailing characters from SAS strings
Source: https://blogs.sas.com/content/sgf/2021/08/23/removing-leading-characters-from-sas-strings/
Posted by: codythelint.blogspot.com
0 Response to "How To Clean Data Removing Tabs In Sas"
Post a Comment